Sage Journals: Discover world-class research

Abstract

Background:

Medication harm is a significant healthcare challenge in hospitalised adult patients. Machine learning (ML) approaches offer the potential to improve prediction accuracy for medication harm by capturing complex relationships among clinical risk factors that traditional statistical models may not detect.

Objective:

To develop and evaluate ML models for predicting medication harm in hospitalised adult patients.

Design:

ML study involving secondary use of a prospectively collected hospital cohort dataset.

Methods:

This study used data from 279 adult patients admitted to general medical and geriatric wards of a tertiary hospital, among whom 40 experienced 51 medication harm events. Eight ML models were trained and evaluated for identifying patients at risk of medication harm. Medication harm cases were identified through detailed chart reviews, trigger tools, voluntary incident reporting, and International Classification of Diseases version 10 discharge coding. Data were pre-processed with missing values imputed using median imputation. Ten predictive features were selected using recursive feature elimination and clinical expert opinion. Models were trained using stratified 10-fold cross-validation with an 80/20 train-test split. Class imbalance was addressed using an oversampling approach.

Results:

A random forest model demonstrated the highest performance, achieving an area under the receiver operating characteristic curve of 0.76, precision of 0.50, recall of 0.62, F1 score of 0.54, accuracy of 0.86, specificity of 0.90, and an area under the precision-recall curve of 0.47. Predictive features of importance included length of stay, depression, dementia, insulin use, number of medications (⩾15), age (⩾65), opioid use, and antibiotic use.

Conclusion:

This study highlights the potential of ML models to predict medication harm, enabling early identification of high-risk patients for preventive interventions. Interdisciplinary collaboration is essential in developing robust, clinically relevant models that can be used to improve patient safety.

Plain language summary

Using machine learning to predict medication harm in hospitalised adult patients

Why was the study done? Medication harm is a common and often preventable issue in hospitals. Early identification of patients at risk can enable timely interventions by healthcare teams. This study investigated the potential of machine learning to predict which adult patients are most likely to experience medication-related harm during hospitalisation.

What did the researchers do? Researchers prospectively collected clinical and demographic data from 279 adult patients admitted to general and geriatric wards in an Australian hospital. Medication harm was identified using multiple approaches, including chart reviews, trigger tools, incident reports, and discharge diagnostic codes. Eight machine learning models were trained and evaluated to predict the likelihood of medication-related harm.

What did the researchers find? Of the eight models evaluated, the random forest model performed best. It correctly identified most patients who were not at risk and showed moderate ability to distinguish between at-risk and low-risk individuals. Additionally, the model showed good overall accuracy. However, its ability to correctly identify patients who experienced harm was moderate, with some patients incorrectly flagged as high-risk. Key predictive variables for medication-related harm included depression, dementia, insulin use, antibiotic use, opioid use, number of medications (⩾15), age (⩾65), and length of stay.

What do the findings mean? This study provides proof of concept that machine learning can be used to predict medication harm in hospitalised adult patients, even with a relatively small dataset. Further research involving larger and more diverse patient populations is needed to improve model performance and reliability before it can be used in routine clinical care. These findings support the development of tools to help clinicians effectively identify and prioritise patients at high risk of medication harm, enabling timely interventions to improve patient safety in hospitals.

Keywords

machine learning medication harm patient safety predictive model

Introduction

Medication harm is a significant healthcare challenge that manifests in potentially avoidable morbidity and mortality.^1–3 Preventing harm from inappropriate medication use achieves better patient outcomes and saves limited healthcare resources.^4,5 About one in five inpatients experience medication harm, with a third of these cases deemed preventable.⁶ Examples include adverse effects from antidiabetic drugs, anticoagulants, and antihypertensives, which can be mitigated through comprehensive medication review and patient monitoring.⁷ However, heavy workloads and time pressures in busy hospitals often limit the effectiveness of these strategies.^8–10

Automated risk prediction methods may allow clinicians to systematically identify patients at high risk of medication harm and implement targeted preventive strategies.^11,12 Several prior studies have explored risk prediction models for medication harm using traditional regression models or machine learning (ML) methods. For example, the Adverse Drug Reactions and Events in an Ageing PopulaTion Risk Prediction (ADAPTiP) regression tool, using retrospective data, demonstrated the feasibility of prediction models for medication harm applied to the older adult (⩾65 years) population.¹³ Another study applied ML methods to a large retrospective tertiary hospital patient dataset across all ages in which medication harm affected only 5.2% of patients, resulting in model imbalance.¹⁴ The five models performed well at identifying patients not at risk (high negative predictive value, ~97.5%) but had low precision (positive predictive value (PPV), ~10%) for medication harm, highlighting challenges in accurately predicting risk of rare outcomes.¹⁴

Although ML-based prediction models have demonstrated superior performance over conventional statistical methods, such as logistic regression in various in silico clinical applications,^15–18 models of either type may not perform well in real-world settings and exert clinical impact.¹⁹ Nevertheless, ML models can capture complex, non-linear relationships and interactions among patient characteristics, medication use patterns, laboratory test results, and history of adverse drug reactions,^20,21 and may better predict medication harm and support targeted interventions.

The Adverse Inpatient Medication Event and Frailty (AIME-Frail) was a logistic regression model developed using retrospectively extracted data from electronic health records (EHRs) of hospitalised adult patients with completed episodes of care, in which medication harm was ascertained using International Classification of Diseases (ICD) codes.²² While this model demonstrated good predictive accuracy (i.e., area under the receiver operating characteristic curve [AUROC]) of 0.79, it was limited by low precision (i.e., PPV) of 0.14 due to the relative infrequency of ICD-coded medication harm within the derivation dataset. We hypothesised that applying ML to a prospectively collected dataset from 279 patients, with more comprehensive measures of harm, could yield better results. Hence, this study aimed to develop and evaluate different ML models to predict medication harm in hospitalised adults, while attempting to address challenges of class imbalance experienced in prior studies.

Methods

Study design

Model development and evaluation were conducted between 13 October 2024 and 25 June 2025. The study followed Steyerberg et al.’s seven-step framework to guide predictive model development: (1) defining the research question and performing an initial data inspection, (2) coding and preparing predictors, (3) specifying the model structure, (4) estimating model parameters, (5) evaluating model performance, (6) conducting internal validation, and (7) presenting the model and its results.²³ Additionally, the reporting of this study conforms to the TRIPOD-AI guidelines.²⁴

Data collection and preparation

This study used a prospective dataset from a controlled trial evaluating the clinical impact of the AIME-Frail model, which has been reported separately.²⁵ In that study, participants were included if they were newly admitted to one of four geriatric wards or two general medical teams at Princess Alexandra Hospital, were receiving active inpatient care during the study period, and had provided informed consent either personally or through a substitute decision-maker. Patients were excluded if they had been admitted to the same study wards or teams prior to study commencement, received end-of-life care, were discharged before consent could be obtained, or declined participation.

The extracted data comprised patient demographics, clinical history, lifestyle factors, and hospitalisation data, collected prospectively from EHRs on admission to general medicine or geriatric rehabilitation wards between 2 May 2022 and 3 July 2023. Features of importance identified in the original AIME-Frail model included high-risk medications, categorised according to the APINCH acronym,²⁶ length of hospitalisation and frailty, the latter assessed using the hospital frailty risk score²⁷ was calculated retrospectively. In this prospective study, the Clinical Frailty Scale (CFS)²⁸, ranging from 1 = very fit to 9 = terminally ill, was used as the method of choice as per local hospital policy. The only variable with missing data was serum creatinine (missing for 2.2% of the cohort). Median imputation was applied, assuming the missing values were missing at random, given the low rate of missingness and the right-skewed distribution.

The primary outcome measure of medication harm was defined as any negative patient outcome or injury related to medication use.²⁹ To ensure comprehensive identification of all possible harm cases, multiple data sources were used, comprising regular chart review by researchers throughout the admission, a modified trigger tool that identified diagnoses such as falls as indicators of potential harm events and associated suspected drug(s),³⁰ voluntary reporting of harm by clinical staff, and discharge coding (ICD-10) specific to medication harm.³¹ This approach reduced the risk of bias or missed cases that might arise from relying on a single data source. We only included cases of harm rated by the researchers as certain, likely, or possible to be caused by medications and not attributable to other factors, according to the World Health Organization-Uppsala Monitoring Centre (WHO-UMC) causality categories.³² Additionally, an expert panel which included a geriatrician and two pharmacists (one from the general medical team and one from the geriatric team) independently verified the accuracy of the causality categorisation process for each case of medication-attributable harm based on a majority vote.

Sample size considerations

The study was constrained by the availability of data from 279 patients, of whom 40 experienced the outcome of interest (event rate = 14.3%). A minimum of 189 patients was calculated as necessary to estimate the outcome prevalence with a ±5% margin of error and 95% confidence.³³ Although the events-per-variable (EPV = 4) was lower than commonly recommended thresholds (EPV = 10–20), the total sample size exceeded the minimum required to estimate outcome prevalence. As a post hoc diagnostic check of model fit on the pooled development and test dataset, likelihood-based pseudo-R² measures were derived: Cox-Snell R² = 0.3592, maximum Cox-Snell R² = 0.5605, and Nagelkerke R² = 0.6409. These values suggest reasonable apparent model fit. However, they do not mitigate the limitations imposed by the small number of events and low EPV.

Exploratory data analysis

Following data processing, a total of 42 features were chosen for model input; their frequency distributions are depicted in Supplemental Appendix A. Three features, in addition to being analysed as continuous variables, were also analysed as binarised categorical variables: age, medication count, and frailty scores, categorised according to clinically relevant threshold values. For age, participants were grouped into <65 years (30.5%) and ⩾65 years (69.5%) according to the accepted definition of older patients.³⁴ Medication use was divided into <15 medications (70.3%) and ⩾15 medications (29.7%), the latter defining excessive polypharmacy.³⁵ Frailty scores were grouped into <5 points (34.1%) and ⩾5 points (65.9%) on the CFS, where scores <5 represent fit or pre-frailty status.²⁸ All other categorical features comprised patients who did or did not have the medical condition in question or did or did not receive the stated medication.

Supplemental Appendix B depicts a feature-to-feature correlation heatmap used to assess multicollinearity in the feature correlation matrix. Despite weak, non-significant correlations and potential collinearity, “AC” and “VTEPPX” were retained as potential model features due to their deemed clinical importance. Specifically, “AC” captures the broader use of anticoagulants, including both oral and intravenous forms, while “VTEPPX” specifically relates to anticoagulants used at prophylactic doses. Since medication harm could manifest differently from prophylactic versus broader anticoagulant use, both features provide complementary information. The statistical associations between features and medication harm are detailed in Supplemental Appendix C. As these were exploratory analyses focused on providing insight into the relationships between features and medication harm, they were not used to guide feature selection.

Feature analysis and selection

Correlation analysis was conducted to identify feature relationships with medication harm, using Pearson correlation (r) for continuous features and Cramér’s V for categorical features, with chi-square tests used to determine statistical significance as denoted by p values < 0.05. Recursive feature elimination (RFE) was performed to generate a shortlist of features based on model contribution. The shortlisted features were then reviewed through an iterative consensus process involving two data scientists (AA, HM) with experience in digital health data, and two clinical pharmacists (NF, JL) with expertise in medication safety. Final feature selection was guided by this multidisciplinary agreement, prioritising clinical relevance and feature availability within routinely collected digital hospital data, instead of relying solely on statistical correlations.

Feature discretisation

Feature discretisation was conducted iteratively to ensure the chosen features were correctly interpreted by the model during training and to optimise performance. RFE was used as an advisory step to rank candidate predictors and initially identified age, number of medications, CFS, length of stay, and creatinine. Although frailty measured as CFS was identified by RFE as a predictor, we excluded it to enhance generalisability across diverse hospital settings given that CFS is not routinely captured in EHRs and typically requires individual clinical assessment. Features identified as clinically important in the original AIME-Frail model,²² including length of stay, antibiotics, insulin and opioids, were retained based on clinical relevance rather than statistical associations alone. The final set of 10 features used in the ML models comprised those four features along with dementia, depression, antiplatelets, number of medications ⩾15, age ⩾65 and creatinine. These were selected based on both the RFE shortlist and multidisciplinary expert agreement regarding clinical relevance and practicality.

Model development

Eight models were developed using scikit-learn³⁶: random forest,³⁷ Light Gradient Boosting Machine (LightGBM),³⁸ Extreme Gradient Boosting (XGBoost),³⁹ logistic regression,⁴⁰ Categorical Boosting (CatBoost),⁴¹ gradient boosting classifier,⁴² multi-layer perceptron (MLP),⁴³ and Support Vector Machine (SVM).⁴⁴ Based on prior studies,^45–47 these models were chosen for their general ability to handle high-dimensional feature spaces and their flexibility in capturing both linear and non-linear relationships, which are important for modelling complex clinical outcomes such as medication harm.

Preprocessing steps to prepare data for model training included label encoding (converting categorical variables into numerical form), scaling (adjusting data so that features with larger values did not disproportionately influence the model), and one-hot encoding (creating separate columns for each category to enable easier processing).⁴⁸

The original dataset comprised 279 actual patient cases, including 51 medication harm events arising from 40 patients who experienced one or more events. All modelling was conducted at the patient level, with any patient experiencing ⩾1 harm event classified as a harm case. The dataset was split into training and testing sets using an 80/20 ratio, resulting in 223 training patient cases with 32 medication harm events and 56 testing patient cases with 8 medication harm events. To address class imbalance due to the low prevalence of medication harm, adaptive synthetic sampling (ADASYN) was applied,⁴⁹ which generated an additional 164 patient cases with harm, increasing the total number of harm events in the training set to 196.

Eight ML models were trained on the training dataset (80% of total cohort), which, after adding the ADASYN oversampling data, comprised a total of 387 patient cases: 191 actual patients without medication harm (unchanged) and 196 with harm (32 original cases; 164 ADASYN cases). The features of the ADASYN-generated minority samples approximate the feature distributions seen in the actual harm cases, supporting model training while leaving the original dataset unchanged.⁴⁹

Model training was conducted using stratified 10-fold cross-validation to ensure similar class distribution across folds. Hyperparameter tuning was nested within the cross-validation procedure using GridSearchCV,^50,51 with the optimal hyperparameters selected based on the highest mean AUROC across folds.

Model evaluation

The performance of each model was evaluated on the 20% hold-out test dataset. The model performance metrics comprised AUROC, area under the precision-recall curve (AUPRC), precision (PPV), recall (also termed sensitivity), F1 score (harmonised mean of precision and recall), accuracy (proportion of correct predictions among all predictions), and specificity. A random seed of 42 was set for reproducibility. The ADASYN oversampling method used to address class imbalance was applied exclusively to the training dataset and withheld from the test dataset.⁵²

Optimal classification thresholds for predicting medication harm were determined using the Youden index, which balanced true positive and false positive rates.^53,54 95% confidence intervals (CI) around the metric estimates were derived using bootstrapping on the testing dataset (n = 56) with 1,000 samples. A confusion matrix comparing actual versus predicted cases of harm and non-harm was also constructed. Decision curve analysis was employed to evaluate the net clinical benefit of the model across a range of threshold probabilities.

Model explainability

In making model outputs more explainable to clinicians, SHapley Additive exPlanations (SHAP) summary plots were constructed to identify important features contributing to risk predictions.⁵⁵

Statistical analysis

In addition to the correlation and feature selection procedures described above, univariate analyses were also conducted to quantify the individual associations between predictors and the outcome, including correlation coefficients, R² values and tests of statistical significance. Further statistical analyses were conducted to assess model adequacy, performance, and clinical utility. Model adequacy was evaluated through post hoc logistic regression to determine dataset sufficiency for multivariable modelling, and by examining model stability across multiple resampled training and test datasets via bootstrapping at sample fractions of 0.5, 0.75, and 1.0. Model performance was assessed internally using stratified 10-fold cross-validation during model development, while the final model performance and clinical utility were evaluated on the 20% hold-out test dataset using standard performance metrics and decision curve analysis.

The ML framework was implemented in Python 3.10.0 using Jupyter Notebook.^56,57 Statistical analyses were performed using the libraries pandas,⁵⁸ numpy,⁵⁹ scipy.stats⁶⁰ and statsmodels.⁶¹ Visualisations were produced with matplotlib.pyplot,⁶² and seaborn.⁶³ ML model development and preprocessing were implemented using scikit-learn,⁶⁴ with oversampling handled using imblearn,⁶⁵ and model interpretability assessed using shap.⁶⁶

Results

Participant characteristics

Table 1 outlines the baseline characteristics of the study population, comprising 279 patients. Participants had a median age of 74 years (interquartile range (IQR): 61–83). The median number of medications used was 11 (IQR: 8–15), and the median length of hospital stay was 20 days (IQR: 9–44). Median serum creatinine level was 79 µmol/L (IQR: 65–112) and median CFS was 5 (IQR: 4–6), indicating moderate frailty.

Table 1.

Participant characteristics.

Population characteristics	N = 279
Age (years)
Median (IQR)	74 (61–83)
Medications (number)
Median (IQR)	11 (8–15)
Gender (Male) (%)	151 (54)
Aboriginal/ Torres Strait Islander (%)	7 (2.5)
LoS (days)
Median (IQR)	20 (9–44)
Clinical Frailty Scale (At admission)
Median (IQR)	5 (4–6)
Creatinine (μmol/L)
Median (IQR)	79 (65 – 112)
Missing data	6 (2.2)
Medical conditions (%)
Hypertension	169 (60.6)
Cardiovascular	114 (40.9)
Diabetes (Type 2)	62 (22.2)
Stroke	53 (19.0)
Depression	52 (18.6)
Diabetes (Type 1)	31 (11.1)
Asthma	28 (10.0)
COPD	28 (10.0)
Psychiatric condition	19 (6.8)
Dementia	19 (6.8)
Parkinson’s disease	11 (3.9)
Liver disease	12 (4.3)
Cancer	9 (3.2)
Medication class (%)
Antihypertensive	164 (58.8)
Lipid lowering medications	136 (48.7)
Laxatives	115 (41.2)
Opioids	110 (39.4)
Antiplatelets	100 (35.8)
Antibiotics	95 (34.1)
Antidepressant	91 (32.6)
Insulin	65 (23.3)
Antipsychotics	29 (10.4)
Immunosuppressant	24 (8.6)
Potassium (IV)	13 (4.7)
Chemotherapy	3 (1.1)
Anticoagulants (AC) Oral and IV	205 (73.5)
Enoxaparin	106 (38.0)
Heparin	29 (10.4)
Dalteparin	16 (5.7)
Warfarin	3 (1.1)
Direct-acting Oral Anticoagulant (DOAC)	51 (18.3)
Prophylaxis dose of anticoagulant (VTEPPX)	191 (68.5)
Treatment dose of anticoagulant (VTETX)	14 (5.0)
Treatment dose of oral anticoagulant (OACTX)	17 (6.1)
Age categories
<65 years old	85 (30.5)
⩾65 years old	194 (69.5)
Medication categories
<15 number of medicines used	196 (70.3)
⩾15 number of medicines used	83 (29.7)
Frailty categories
<5 points of clinical frailty scale	95 (34.1)
⩾5 points of clinical frailty scale	184 (65.9)

AC, Anticoagulants; COPD, Chronic Obstructive Pulmonary Disease; DOAC, Direct-acting Oral Anticoagulant; IQR, Interquartile range; LoS, Length of Stay; meds, medicines; Min-Max, Minimum-maximum; OACTX, Treatment dose of oral anticoagulants; VTEPPX, Prophylaxis dose of anticoagulant; VTETX, Treatment dose of anticoagulant.

Males comprised 54% of the cohort, with 2.5% identified as Aboriginal and/or Torres Strait Islander. The most frequent medical conditions were hypertension (60.6%), cardiovascular disease (40.9%), type 2 diabetes (22.2%), stroke (19.0%), and depression (18.6%).

For chronic medications, 58.8% of participants used antihypertensives, 48.7% used lipid-lowering medications, 39.4% used opioids, 34.1% used antibiotics, and 32.6% used antidepressants. Other commonly used medications included insulin (23.3%), antiplatelet drugs (35.8%) and laxatives (41.2%).

Anticoagulant use during hospitalisation was common, with 73.5% of patients receiving an anticoagulant that was administered orally or intravenously. Most anticoagulation was provided at prophylactic-dose levels for venous thromboembolism prevention (VTEPPX), received by 68.5% of patients, while treatment-dose anticoagulation for venous thromboembolism (VTETX) was used in 5.0% and treatment-dose oral anticoagulants (OACTX) were used in 6.1% of patients.

Medication harm

Table 2 lists the 51 observed instances of medication harm identified in 40 of the 279 patients, excluding the 164 additional cases generated through ADASYN. The most common medication harms were diarrhoea, constipation, and nausea/vomiting, each accounting for 7 events (13.7%). Bleeding or bruising occurred in 5 events (9.8%), while hypoglycaemia, acute kidney injury, and confusion each occurred in 4 events (7.8%). Falls occurred in 3 events (5.9%); neutropenia and hyperkalaemia each occurred in 2 events (3.9%). Additionally, low white blood cell count, hyponatraemia, oedema, rash, thrombocytopenia, and dyspepsia each occurred in 1 event (2.0%).

Table 2.

Distribution of actual medication harm events.

Types of medication harm (n = 51)	Frequency (%) of harm events
Diarrhoea	7 (13.7)
Constipation	7 (13.7)
Nausea/vomiting	7 (13.7)
Bleeding/bruising	5 (9.8)
Hypoglycaemia	4 (7.8)
Acute kidney injury	4 (7.8)
Confusion	4 (7.8)
Fall	3 (5.9)
Neutropenia	2 (3.9)
Hyperkalaemia	2 (3.9)
Low white blood cell count	1 (2.0)
Hyponatremia	1 (2.0)
Oedema	1 (2.0)
Rash	1 (2.0)
Thrombocytopenia	1 (2.0)
Dyspepsia	1 (2.0)

Model evaluation

Based on AUROC, the eight models were ranked from best to worst, as follows: random forest, CatBoost, XGBoost, gradient boosting classifier, logistic regression, SVM, LightGBM, and MLP (Table 3). The optimised random forest model demonstrated the strongest overall performance across metrics, achieving an AUROC of 0.76 (95% CI: 0.54–0.94), AUPRC of 0.47 (95% CI: 0.15–0.79), precision (PPV) of 0.50 (95% CI: 0.17–0.82), recall (true positive rate) of 0.62 (95% CI: 0.25–1.00), F1 score of 0.54 (95% CI: 0.20–0.80), accuracy of 0.86 (95% CI: 0.77–0.95), and specificity (true negative rate) of 0.90 (95% CI: 0.80–0.97). This corresponded to a false positive rate of 0.10 and a false negative rate of 0.38. Figure 1 depicts the ROC curve and confusion matrix. Hyperparameter optimisation did not improve performance, with the default parameters being selected as optimal, and the final model using these settings, including 100 trees, the Gini criterion, and sqrt for maximum number of features.

Table 3.

Summary of model performance.

Model	AUROC (CI)	Precision/PPV (CI)	Recall/sensitivity (CI)	F1 score (CI)	Accuracy (CI)	Specificity (CI)	AUPRC (CI)
Random forest	0.76 (0.54–0.94)	0.50 (0.17–0.82)	0.62 (0.25–1.00)	0.54 (0.20–0.80)	0.86 (0.77–0.95)	0.90 (0.80–0.97)	0.47 (0.15–0.79)
CatBoost	0.74 (0.49–0.94)	0.41 (0.13–0.73)	0.62 (0.25–1.00)	0.49 (0.15–0.76)	0.82 (0.71–0.91)	0.85 (0.75–0.94)	0.50 (0.16–0.82)
XGBoost	0.73 (0.48–0.94)	0.46 (0.14–0.78)	0.62 (0.25–1.00)	0.51 (0.17–0.79)	0.84 (0.75–0.93)	0.88 (0.78–0.96)	0.50 (0.14–0.82)
GBC	0.73 (0.50–0.94)	0.38 (0.13–0.65)	0.63 (0.25–1.00)	0.46 (0.18–0.71)	0.80 (0.70–0.89)	0.83 (0.72–0.94)	0.50 (0.15–0.81)
LR	0.73 (0.48–0.93)	0.43 (0.17–0.69)	0.75 (0.40–1.00)	0.53 (0.25–0.77)	0.82 (0.71–0.91)	0.83 (0.71–0.93)	0.46 (0.14–0.75)
SVM	0.73 (0.50–0.92)	0.33 (0.13–0.56)	0.75 (0.40–1.00)	0.45 (0.21–0.69)	0.75 (0.64–0.86)	0.75 (0.63–0.86)	0.44 (0.14–0.74)
LightGBM	0.72 (0.48–0.92)	0.38 (0.13–0.67)	0.62 (0.25–1.00)	0.46 (0.15–0.71)	0.81 (0.70–0.89)	0.83 (0.72–0.94)	0.46 (0.14–0.77)
MLP	0.70 (0.47–0.88)	0.38 (0.13–0.65)	0.62 (0.25–1.00)	0.46 (0.17–0.71)	0.80 (0.70–0.89)	0.83 (0.72–0.94)	0.33 (0.12–0.60)

AUPRC, Area under the precision-recall curve; AUROC, Area Under the Receiver Operating Characteristic Curve; CatBoost, Categorical Boosting; CI, Confidence Interval; GBC, Gradient Boosting Classifier; LightGBM, Light Gradient Boosting Machine; LR, Logistic Regression; MLP, Multi-Layer Perceptron; PPV, positive predictive value; SVM, Support Vector Machine; XGBoost, Extreme Gradient Boosting.

Figure 1.

Confusion matrix, AUROC curve and AUPRC of the best model (random forest). Validation dataset= (n = 56). True positive rate = 0.625, False positive rate = 0.104, True negative rate = 0.896, False negative rate = 0.375.

The CatBoost model demonstrated comparable performance, with an AUROC of 0.74 (95% CI: 0.49–0.94), precision of 0.41 (95% CI: 0.13–0.73), recall of 0.62 (95% CI: 0.25–1.00), F1 score of 0.49 (95% CI: 0.15–0.76), accuracy of 0.82 (95% CI: 0.71–0.91), specificity of 0.85 (95% CI: 0.75–0.94), and an AUPRC of 0.50 (95% CI: 0.16–0.82). While CatBoost had a lower AUROC, accuracy, specificity, precision, and F1 score compared with the random forest model, it demonstrated equivalent recall and slightly outperformed random forest in AUPRC. All other models achieved AUROC values above 0.70, indicating reasonable discriminatory performance.

Bootstrap resampling was performed on both the training and test datasets at different resample fractions (50%, 75%, and 100%) to assess the stability of model performance (Supplemental Appendix D). This analysis demonstrated that the model was robust in the training data, with narrow confidence intervals indicating stable performance estimates. In the test dataset, the confidence intervals widened, as expected, particularly at smaller resample fractions. Even when using the full test dataset (resample fraction = 100%), the intervals remained wider than those observed in training, reflecting both the limited size of the test dataset and uncertainty in model performance on unseen data. The slightly higher performance observed in the training dataset compared with the test dataset likely contributed to these differences and is consistent with expectations in small-sample settings. Overall, these findings highlight the expected variability in performance estimates on unseen data while supporting the reliability of the model.

Potential clinical utility

Decision curve analysis (Figure 2) demonstrated that the model provided a higher net benefit than both the “treat-all” and “treat-none” strategies across threshold probabilities of harm of approximately 0.10–0.40. This suggests potential benefit from targeted clinical intervention when the estimated risk of medication harm lies between 10% and 40%, compared with decision-making without model support. These findings underscore the potential clinical utility of the model in routine practice.

Figure 2.

Decision curve analysis of best-performing model (random forest).

Model explainability

The SHAP summary plot depicted in Figure 3 illustrates both the relative importance and direction of each feature’s contribution to the model-predicted risk of medication harm, with higher positive SHAP values indicating a greater predicted probability of harm. Table 4 presents the mean absolute SHAP values, which quantify each variable’s overall influence on the model output. Depression showed the highest mean absolute SHAP value (0.144), followed by length of hospital stay (0.091), dementia (0.062), insulin use (0.058), and antibiotic use (0.050). Number of medications (⩾15; 0.044), older age (⩾65 years; 0.042), and opioid use (0.037) were also associated with an increased likelihood of medication harm. In contrast, antiplatelet use was associated with a lower probability of harm, although with a comparatively smaller contribution (0.015). The effect of serum creatinine (0.042) was highly variable, likely reflecting interactions with other patient characteristics.

Figure 3.

SHapely Additive exPlanations features summary plot – Red dots represent higher values or presence of the feature (e.g., longer LOS, use of insulin or antibiotics, presence of dementia), while blue dots indicate lower values or absence of the feature (e.g., shorter LOS, no use of insulin or antibiotics, lower serum creatinine level, absence of dementia). Features positioned to the right of the SHAP value of 0.0 are associated with an increased risk of harm, whereas features to the left suggest a decreased risk of harm. A wider spread or higher density of dots indicates more significant variability or a more substantial impact, respectively, of the values of that feature on the model’s predictions. LOS: Length of Stay, Age Category: ⩾65 years, Meds Category: ⩾15.

Table 4.

SHapely Additive exPlanations (SHAP) values table.

Feature	Mean absolute SHAP values
Depression	0.144425
Length of Stay	0.090819
Dementia	0.061867
Insulin	0.057995
Antibiotics	0.049594
Meds Category	0.043601
Age Category	0.042473
Creatinine	0.041856
Opioids	0.037030
Antiplatelets	0.014668

Age Category: 65 years and older, Meds Category: 15 or more number of medications.

Additionally, the random forest feature importance plot with 95% confidence intervals (Supplemental Appendix E) shows each predictor’s contribution to model performance. Length of hospital stay demonstrated the highest mean importance (0.230 [95% CI: 0.197–0.266]), followed by serum creatinine (0.202 [0.172–0.233]) and depression (0.166 [0.118–0.220]). Dementia (0.093 [0.057–0.133]), insulin use (0.061 [0.038–0.100]), and number of medications (0.060 [0.040–0.093]) were also important contributors, while age category (⩾65 years; 0.052 [0.034–0.073]), opioid use (0.049 [0.033–0.073]), antibiotic use (0.048 [0.031–0.074]), and antiplatelet use (0.039 [0.030–0.049]) showed relatively smaller but consistent contributions to model performance. Overall, these results support the SHAP findings with depression, length of hospital stay, dementia, insulin use, antibiotic use, number of medications (⩾15), older age (⩾65 years), and opioid use emerging as the most influential predictors of medication harm.

Discussion

This study developed and evaluated eight ML models to predict medication harm in hospitalised older adult patients using a prospective dataset collected from 279 patients, supplemented by 164 oversampled cases of harm to address class imbalance. It has several strengths. First, compared with prior studies relying solely on relatively insensitive ICD coding of medication harm,^67,68 our study adopted a more comprehensive approach to ascertainment by prospectively collecting data from multiple data sources in addition to ICD-10 coding. Second, we used clinical experts to both identify clinically relevant features and verify harm events as being causally related to medication, the latter a frequent omission among risk prediction studies,⁶⁹ underscoring the value of interdisciplinary collaboration between model developers and clinicians. Third, we trained eight separate ML models using 10 features and applied oversampling methods to address class imbalance in observed cases of harm. Fourth, we reported several performance metrics, including AUPRC, which are more informative than relying on AUROC alone. Finally, we identified a high-performing random forest model that outperformed the logistic regression model built in this study across multiple metrics (AUROC, AUPRC, PPV, F1 score, accuracy, and specificity), confirming its greater ability to encode complex, non-linear relationships and interactions among a greater number of features.^20,21 As reported, the random forest model achieved an AUROC of 0.76, PPV of 0.50, sensitivity of 0.62, and specificity of 0.90, compared with 0.79, 0.14, 0.69, and 0.98, respectively, for the original regression-based AIME-Frail model.²²

While the AUROC of the random forest model (0.76) indicates strong overall discrimination, the AUPRC (0.47) demonstrates good precision-recall performance compared with a baseline random performance equal to the low prevalence of medication harm (40 of 279 cases; 14.3%). This indicates the model performs approximately 3.3 times better than chance in identifying true positive cases and is comparable with values frequently observed in other studies featuring class imbalance.^70,71 The improved PPV of the random forest model comes at the expense of a somewhat lower sensitivity and specificity compared with the AIME-Frail model.²² The higher PPV for the random forest may also reflect a higher prevalence of harm in this study, which used more sensitive measures of harm than AIME-Frail that relied on ICD discharge coding alone.²²

Comparison with prior research

Previous studies, including both ML-based models^72-78 and one traditional statistical model,⁷⁶ have identified various clinical features in predicting medication harm in hospitalised patients that correspond to those seen in our ML model.

Seger et al.⁷³ reported that among at-risk patients, receiving five or more medications was associated with a mean (standard deviation [SD]) of 6.7 (4.1) medication warnings. Wei et al.⁷⁴ demonstrated that a pre-existing elevated creatinine level is a predictor of medication-related acute kidney injury, defined as an increase in serum creatinine of 0.3 mg/dL or 50% from the baseline value or urine output of <0.5 mL/kg/h. Dave et al.⁷⁵ identified the amount and timing of insulin doses delivered by insulin pumps and carbohydrate intake as features predictive of hypoglycaemic risk.

Hu et al.⁷² built an ML model in which longer length of hospitalisation, older age, and antibiotic use were seen as important features predicting medication harm, as we also noted. However, no other ML study, apart from ours, has specifically identified dementia, opioid use, and depression as predictors, although these have been noted in various non-ML clinical studies. Just et al.⁷⁷ calculated odds ratios (ORs) to assess the association of opioids with medication harm, reporting an OR of 1.79 (95% CI: 1.26–2.54). Onder et al.⁷⁸ similarly identified depression as a predictor of medication harm among hospitalised older adults, with an OR of 1.58 (95% CI: 1.14–2.20).

In contrast, Sakiris et al.⁷⁶ found that adverse drug events were less frequent in patients with dementia versus those without dementia (8.3% vs. 14.6%; p < 0.001). This association was derived from a multivariable logistic regression model and reflects the adjusted effect of dementia within the context of the included predictors, not a direct causal effect. Another unexpected finding in our study was the association between antiplatelet use and lower risk of medication harm, which differs from the well-documented risk of bleeding, particularly in patients receiving multiple antiplatelet agents.⁷⁹

Clinical implications

From a clinical perspective, the greater PPV of the random forest model means clinicians only have to review two patients to identify the one patient truly at risk of experiencing medication harm, enabling a targeted approach to medication review and other interventions without overwhelming clinicians with false alerts. At the same time, the random forest model fails to predict about a third of harm events, which may or may not be serious events impacting patient well-being. Further studies are warranted to examine the relationship between model-predicted risk and the severity of harm events. Increasing sensitivity when harm prevalence is low comes with an increase in false positives, which can exacerbate clinician workloads. Such trade-offs are inevitable in using prediction models, and it is up to the local stakeholders to decide the choice of trade-off depending on site-specific imperatives regarding patient safety, staffing ratios and workloads, and resource availability.

Despite these possible shortcomings in performance, prediction models have shown potential to improve patient medication management. As examples, Segal et al.⁸⁰ reported that their ML model was highly accurate in detecting medication errors likely to cause adverse drug events. Stenokova et al.⁸¹ developed highly accurate models that distinguish penicillin allergy from intolerance and high-risk from low-risk allergy. Finally, Imai et al.⁸² developed a neural network model that accurately predicts vancomycin-induced nephrotoxicity.

Limitations

A major limitation was the small dataset used for model training, which was obtained from patients admitted to one hospital. This risks model overfitting and limits its generalisability, especially when it is yet to be externally validated. Additionally, we observed wide confidence intervals for certain performance metrics, such as recall, reflecting both the small dataset size and the use of oversampling methods to address class imbalance, which may have increased variance and instability in performance estimates.

Excluding the CFS from model development due to its lack of routine documentation in EHRs, despite its potential as an important predictor of harm, highlights the challenge of improving EHR documentation to support more accurate risk prediction models that use all potentially useful clinical features. The present study underscores the importance of considering both data availability and clinical relevance when selecting features in ensuring models can be integrated into real-world clinical settings.

Future directions

Future studies predicting medication harm using ML should prioritise collecting contemporary and representative large datasets from EHRs and employing sensitive measures of medication harm to enhance model accuracy and generalisability. As there is no single best model technique, future research should compare various models to ensure the most suitable model is identified and selected for a given prediction task in a specific setting targeting a defined patient population.⁸³ In doing so, collaboration between data scientists and clinicians is essential in developing and evaluating clinically applicable models that can be integrated into local workflows and assessed for their impact on care and patient outcomes through prospective “silent mode” evaluations and pragmatic clinical trials.⁸⁴

Conclusion

This study highlights the potential of supervised ML models for predicting medication harm among hospitalised adult patients. As a proof of concept, it demonstrates that a random forest model with high PPV can be developed using a relatively small dataset of prospectively ascertained and clinician-verified harm events, with close collaboration between data scientists and clinicians to ensure appropriate selection of predictive features. Although such methods are resource-intense, having larger clinically curated datasets from specific patient populations will yield more accurate and generalisable models likely to perform reliably when externally validated, paving the way for clinical trials and eventual adoption into clinical practice.

Supplemental Material

sj-docx-1-taw-10.1177_20420986251409325 – Supplemental material for Machine learning risk prediction models for medication harm in hospitalised adult patients

Supplemental material, sj-docx-1-taw-10.1177_20420986251409325 for Machine learning risk prediction models for medication harm in hospitalised adult patients by Jonathan Yong Jie Lam, Michael Barras, Ian A. Scott, Hassan Masood, Ahmad Abdel-Hafez and Nazanin Falconer in Therapeutic Advances in Drug Safety

Footnotes

Acknowledgements

The authors would like to thank the Princess Alexandra Research Foundation for their generous support of research in the field of patient safety.

Declarations

ORCID iDs

Jonathan Yong Jie Lam

Michael Barras

Ian A. Scott

Hassan Masood

Ahmad Abdel-Hafez

Nazanin Falconer

Supplemental material

Supplemental material for this article is available online.

References

Hakkarainen

Hedna

Petzold

, et al. Percentage of patients with preventable adverse drug reactions and preventability of adverse drug reactions–a meta-analysis. PLoS One 2012; 7(3): e33236.

Coleman

Pontefract

SK.

Adverse drug reactions. Clin Med (Lond). 2016; 16(5): 481–485.

Hacker

Chapter 13 – Adverse drug reactions. In: Hacker

Messer

Bachmann

(Eds). Pharmacology. Academic Press, 2009, pp. 327–352.

Walsh

Hansen

Sahm

, et al. Economic impact of medication error: a systematic review. Pharmacoepidemiol Drug Saf 2017; 26(5): 481–497.

Tecklenborg

Byrne

Cahir

, et al. Interventions to reduce adverse drug event-related outcomes in older adults: a systematic review and meta-analysis. Drugs Aging 2020; 37: 91–98.

Laatikainen

Miettunen

Sneck

, et al. The prevalence of medication-related adverse events in inpatients—a systematic review and meta-analysis. Eur J Clin Pharmacol 2017; 73(12): 1539–1549.

Pfister

Jonsson

Gustafsson

Drug-related problems and medication reviews among old people with dementia. BMC Pharmacol Toxicol 2017; 18(1): 52.

Ramasamy

Baysari

Lehnbom

, et al. Evidence briefings on interventions to improve medication safety: double-checking medication administration. Sydney, Australia, 2013.

Mergenhagen

Blum

Kugler

, et al. Pharmacist-versus physician-initiated admission medication reconciliation: impact on adverse drug events. Am J Geriatr Pharmacother 2012; 10(4): 242–250.

10.

Falconer

Barras

Cottrell

How hospital pharmacists prioritise patients at high-risk for medication harm. Res Soc Administrat Pharm 2019; 15(10): 1266–1273.

11.

Schnake-Mahl

Carty

Sierra

, et al. Identifying patients with increased risk of severe COVID-19 complications: building an actionable rules-based model for care teams. NEJM Catalyst Innov Care Deliv 2020; 1(3): 3–4.

12.

Ludlow

Churruca

Mumford

, et al. Staff members’ prioritisation of care in residential aged care facilities: a Q methodology study. BMC Health Serv Res 2020; 20(1): 423–423.

13.

Frydenlund

Cosgrave

Moriarty

, et al. Adverse drug reactions and events in an Ageing PopulaTion risk Prediction (ADAPTiP) tool: the development and validation of a model for predicting adverse drug reactions and events in older patients. Eur Geriatr Med 2025; 16(2): 573–581.

14.

Langenberger

Machine learning as a tool to identify inpatients who are not at risk of adverse drug events in a large dataset of a tertiary care hospital in the USA. Br J Clin Pharmacol 2023; 89(12): 3523–3538.

15.

Rajula

HSR

Verlato

Manchia

, et al. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Medicina (Kaunas) 2020; 56(9): 455.

16.

Shin

Austin

Ross

, et al. Machine learning vs. Conventional statistical models for predicting heart failure readmission and mortality. ESC Heart Failure 2021; 8(1): 106–115.

17.

Churpek

Yuen

Winslow

, et al. Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Crit Care Med 2016; 44(2): 368–374.

18.

Hernesniemi

Mahdiani

Tynkkynen

, et al. Extensive phenotype data and machine learning in prediction of mortality in acute coronary syndrome – the MADDEC study. Ann Med 2019; 51(2): 156–163.

19.

Christodoulou

Collins

, et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019; 110: 12–22.

20.

Alanazi

Abdullah

Qureshi

KN.

A critical review for developing accurate and dynamic predictive models using machine learning methods in medicine and health care. J Med Syst 2017; 41(4): 69.

21.

Mir

Dhage

SN.

Diabetes disease prediction using machine learning on big data of healthcare. In: 2018 fourth international conference on computing communication control and automation (ICCUBEA), 2018: 1–6.

22.

Falconer

Scott

Abdel-Hafez

, et al. The adverse inpatient medication event and frailty (AIME-frail) risk prediction model. Res Social Adm Pharm 2024; 20(8): 796–803.

23.

Steyerberg

Vergouwe

Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J 2014; 35(29): 1925–1931.

24.

Collins

Reitsma

Altman

, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med 2015; 13(1): 1.

25.

Lam

JYJ

Barras

Scott

, et al. Impact evaluation of the modified adverse inpatient medication event (AIME-Frail) model in hospitalised adults. Res Soc Administr Pharm 2025; 21: 687–696.

26.

Australian Commission on Safety and Quality in Health Care. APINCHS Classification of High Risk Medicines. Sydney, Australia: Australian Commission on Safety and Quality in Health Care, 2019.

27.

Gilbert

Neuburger

Kraindler

, et al. Development and validation of a Hospital Frailty Risk Score focusing on older people in acute care settings using electronic hospital records: an observational study. Lancet 2018; 391(10132): 1775–1782.

28.

Rockwood

Song

MacKnight

, et al. A global clinical measure of fitness and frailty in elderly people. Cmaj 2005; 173(5): 489–495.

29.

Falconer

Barras

Martin

, et al. Defining and classifying terminology for medication harm: a call for consensus. Eur J Clin Pharmacol 2019; 75(2): 137–145.

30.

Wang

Lam

JYJ

Falconer

, et al. Prospective identification of medication harm in geriatric inpatients using a modified trigger tool. J Pharm Pract Res 2024; 54(5): 376–383

31.

Organization

WH.

International Statistical Classification of Diseases and Related Health Problems, 10th Rev. World Health Organization, 2016.

32.

World Health Organization-Uppsala Monitoring Centre (WHO-UMC) The use of the WHO-UMC system for standardised case causality assessment https://who-umc.org/media/164200/who-umc-causality-assessment_new-logo.pdf (accessed 1 May 2023).

33.

Riley

Ensor

Snell

KIE

, et al. Calculating the sample size required for developing a clinical prediction model. BMJ 2020; 368: m441.

34.

Singh

Bajorek

Defining ‘elderly’ in clinical practice guidelines for pharmacotherapy. Pharm Pract (Granada). 2014; 12(4): 489.

35.

Castro-Rodríguez

Machado-Duque

Gaviria-Mendoza

, et al. Factors related to excessive polypharmacy (⩾15 medications) in an outpatient population from Colombia. Int J Clin Pract 2018; 73: e13278.

36.

Pedregosa

Varoquaux

Gramfort

, et al. Scikit-learn: machine learning in python journal of machine learning research. J Mach Learn Res 2011; 12: 2825–2830.

37.

Breiman

Random forests. Mach Learn 2001; 45: 5–32.

38.

Meng

Finley

, et al. Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inform Proc Syst 2017; 30: 1–8.

39.

Chen

Guestrin

. Xgboost: a scalable tree boosting system. arXiv 2016: 785–794.

40.

Cox

DR.

The regression analysis of binary sequences. J Royal Stat Soc Ser Statist Methodol 1958; 20(2): 215–232.

41.

Prokhorenkova

Gusev

Vorobev

, et al. CatBoost: unbiased boosting with categorical features. Adv Neural Inform Proc Syst 2018; 31: 1–9.

42.

Friedman

JH.

Greedy function approximation: a gradient boosting machine. Ann Statist 2001; 29(5): 1189–1232.

43.

Rumelhart

Hinton

Williams

RJ.

Learning representations by back-propagating errors. Nature 1986; 323(6088): 533–536.

44.

Cortes

Support-vector networks. Machine Learning 1995; 20: 273–297.

45.

Guan

Chang

, et al. Interpretable machine learning models for predicting venous thromboembolism in the intensive care unit: an analysis based on data from 207 centers. Crit Care 2023; 27(1): 406.

46.

Chittora

Chaurasia

Chakrabarti

, et al. Prediction of chronic kidney disease – a machine learning perspective. IEEE Access 2021; 9: 17312–17334.

47.

Wei

Rao

Xiao

, et al. Risk assessment of cardiovascular disease based on SOLSSA-CatBoost model. Expert Syst Applic 2023; 219: 119648.

48.

Géron

Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. Sebastopol, CA: O’Reilly Media, Inc., 2022.

49.

Haibo

Yang

Garcia

, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, pp. 1322–1328.

50.

Pirjatullah

Nugrahadi

DT.

Hyperparameter tuning method of extreme learning machine (ELM) using gridsearchcv in classification of pneumonia in toddlers. J Data Sci Softw Eng 2022; 2(3): 131–140.

51.

Shanthi

Chethan

Genetic algorithm based hyper-parameter tuning to improve the performance of machine learning models. SN Computer Sci 2022; 4(2): 119.

52.

Mahesh

Geman

Margala

, et al. The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification. Healthcare Analytics 2023; 4: 100247.

53.

Unal

Defining an optimal cut-point value in ROC analysis: an alternative approach. Comput Mathem Methods Med 2017; 2017(1): 3762651.

54.

Zhou

Statistical inferences for the Youden Index: Georgia State University, 2011.

55.

Srinivasu

Sandhya

Jhaveri

, et al. From blackbox to explainable AI in healthcare: existing tools and case studies. Mobile Inform Syst 2022; 2022(1): 8167821.

56.

Rossum

Python library reference. CWI (Centre for Mathematics and Computer Science), 1995.

57.

Kluyver

Ragan-Kelley

Pérez

, et al. Jupyter Notebooks–a publishing format for reproducible computational workflows. In: Positioning and power in academic publishing: Players, agents and agendas. Amsterdam, Netherlands: IOS Press, 2016, pp. 87–90.

58.

Reback

McKinney

Van Den Bossche

, et al. pandas-dev/pandas: Pandas 1.0. 5. Zenodo. 2020.

59.

Harris

Millman

Van Der Walt

, et al. Array programming with NumPy. Nature 2020; 585(7825): 357–362.

60.

Virtanen

Gommers

Oliphant

, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 2020; 17(3): 261–272.

61.

Seabold

Perktold

Statsmodels: econometric and statistical modeling with python. SciPy 2010; 7(1): 92–96.

62.

Hunter

JD.

Matplotlib: a 2D graphics environment. Comput Sci Engin 2007; 9(03): 90–95.

63.

Waskom

ML.

Seaborn: statistical data visualization. J Open Source Softw 2021; 6(60): 3021.

64.

Pedregosa

Varoquaux

Gramfort

, et al. Scikit-learn: machine learning in Python. J Machine Learn Res 2011; 12: 2825–2830.

65.

LemaÃŽtre

Nogueira

Aridas

CK.

Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Machine Learn Res 2017; 18(17): 1–5.

66.

Lundberg

Lee

S-I.

A unified approach to interpreting model predictions. Adv Neural Informat Proc Syst 2017; 30: 4768–4777.

67.

Zhao

Henriksson

Asker

, et al. Predictive modeling of structured electronic health records for adverse drug event detection. BMC Med Informat Decision Making 2015; 15(4): S1.

68.

Wolff

Ríos

Gonzáles

Machine learning methods for predicting adverse drug reactions in hospitalized patients. Procedia Computer Science 2023; 225: 22–31.

69.

Yasrebi-de Kom

IAR

Dongelmans

de Keizer

, et al. Electronic health record-based prediction models for in-hospital adverse drug event diagnosis or prognosis: a systematic review. J Am Med Inform Assoc 2023; 30(5): 978–988.

70.

Park

Lee

, et al. Machine learning-based COVID-19 patients triage algorithm using patient-generated health data from nationwide multicenter database. Infect Dis Ther 2022; 11(2): 787–805.

71.

JXC

DhakshinaMurthy

George

, et al. The effect of resampling techniques on the performances of machine learning clinical risk prediction models in the setting of severe class imbalance: development and internal validation in a retrospective cohort. Discov Artif Intellig 2024; 4(1): 91.

72.

, et al. Predicting adverse drug events in older inpatients: a machine learning study. Int J Clin Pharm 2022; 44(6): 1304–1311.

73.

Seger

Amato

Frits

, et al. A machine learning technology for addressing medication-related risk in older, multimorbid patients. Am J Managed Care 2024; 30(8): e233–e239.

74.

Wei

Zhang

Feng

, et al. Machine learning model for predicting acute kidney injury progression in critically ill patients. BMC Med Inform Dec Making 2022; 22(1): 17.

75.

Dave

DeSalvo

Haridas

, et al. Feature-based machine learning model for real-time hypoglycemia prediction. J Diabetes Sci Technol 2021; 15(4): 842–855.

76.

Sakiris

Hilmer

Sawan

, et al. Prevalence of adverse drug reactions in hospital among older patients with and without dementia. Drugs Aging 2024; 41(10): 833–846.

77.

Just

Dormann

Böhme

, et al. Personalising drug safety—results from the multi-centre prospective observational study on Adverse Drug Reactions in Emergency Departments (ADRED). Eur J Clin Pharmacol 2020; 76(3): 439–448.

78.

Onder

Penninx

BWJH

Landi

, et al. Depression and adverse drug reactions among hospitalized older adults. Arch Intern Med 2003; 163(3): 301–305.

79.

Paradissis

Cottrell

Coombes

, et al. Unplanned rehospitalisation due to medication harm following an acute myocardial infarction. Cardiology 2024: 1–15.

80.

Segal

Segev

Brom

, et al. Reducing drug prescription errors and adverse drug events by application of a probabilistic, machine-learning based clinical decision support system in an inpatient setting. J Am Med Inform Assoc 2019; 26(12): 1560–1565.

81.

Stanekova

Inglis

Lam

, et al. Improving the performance of machine learning penicillin adverse drug reaction classification with synthetic data and transfer learning. Intern Med J 2024; 54(7): 1183–1189.

82.

Imai

Takekuma

Kashiwagi

, et al. Validation of the usefulness of artificial neural networks for risk prediction of adverse drug reactions used for individual patients in clinical practice. PLoS One 2020; 15(7): e0236789.

83.

Sufriyana

Husnayain

Chen

Y-L

, et al. Comparison of multivariable logistic regression and other machine learning algorithms for prognostic prediction studies in pregnancy care: systematic review and meta-analysis. JMIR Med Inform 2020; 8(11): e16503.

84.

Bastian

Baker

Limon

Bridging the divide between data scientists and clinicians. Intellig-Based Med 2022; 6: 100066.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.31 MB