Sage Journals: Discover world-class research

Abstract

Aim

To synthesize recent research on artificial intelligence (AI) in intensive care unit (ICU) nursing from 2020 to 2025, highlight trends, and outline integration challenges.

Methods

A narrative synthesis approach was used, reviewing English-language studies from PubMed, Web of Science, Scopus, and IEEE Xplore. From 4138 articles, 37 studies were included.

Results

Evidence was international with strong contributions from Asia and North America. Most studies were retrospective and drew on large ICU databases such as MIMIC-III/IV and eICU. Methods were dominated by machine learning, with limited but growing deep learning. Applications clustered around early warning and risk prediction, with additional work on nursing decision support and workload or documentation support. Reported discrimination frequently exceeded AUC 0.80, while calibration, external validation, and human factors evaluation were less often described.

Conclusion

Artificial intelligence shows promise for earlier risk recognition, decision support, and workflow enablement in ICU nursing. Priorities include multicenter prospective evaluation, external validation with calibration, electronic health record-embedded implementation, and nurse codesign to ensure safe, useful, and generalizable tools.

Implications for clinical practice

Thoughtfully integrated AI can support timely decisions and reduce documentation burden when paired with real-time validation and nurse-led workflow adaptation.

Keywords

Artificial intelligence critical care nursing ICU machine learning predictive analytics

Introduction

Artificial intelligence (AI) has rapidly emerged as a transformative force in healthcare, driven by its potential to improve clinical decision-making, efficiency, and patient outcomes.¹ In critical care settings, such as the intensive care unit (ICU), nurses manage vast amounts of complex, time-sensitive patient data and make high-stakes decisions under pressure.² Artificial intelligence technologies offer tools to assist in this context—for example, by continuously monitoring patient vitals, detecting subtle trends or deteriorations, and providing decision support that can augment nurses’ clinical judgment.³ Over the past decade, there has been a marked increase in research exploring AI applications in critical care nursing, reflecting growing enthusiasm for integrating machine intelligence into ICU nursing practice.⁴ Publications on this topic have followed an upward trajectory globally, with particularly active contributions from countries such as the United States, China, and the United Kingdom. This trend underscores a broad recognition that AI could play a significant role in optimizing ICU workflows and enhancing the precision and timeliness of nursing care.⁵

Current research hotspots illustrate the diverse ways AI is being leveraged to support critical care nursing. Several major domains of AI application in ICU nursing have been intensively analyzed nowadays, including continuous patient monitoring, predictive risk modeling, clinical decision support systems, nursing interventions, documentation automation, and resource allocation.⁶ Predictive analytics is especially prominent—AI models have been developed to forecast clinical events such as sepsis onset, pressure injuries, delirium episodes, or unexpected ICU transfers, enabling earlier interventions to prevent complications.⁷ These innovations aim to enhance patient safety and quality of care—by improving early problem recognition, supporting complex clinical decisions, streamlining documentation, and optimizing the efficiency of care delivery in the ICU. Early studies have reported promising results, such as high accuracy in predicting adverse events or reductions in nurses’ charting time, suggesting that thoughtfully deployed AI could empower ICU nurses and improve patient outcomes.⁸

Despite its promise, the integration of AI into critical care nursing comes with substantial challenges. Intensive care unit nurses and leaders have noted that AI presents both opportunities and difficulties in practice.⁹ One major concern is the “black box” nature of many AI algorithms—a lack of transparency in how recommendations are generated can hinder clinicians’ trust in AI tools.¹⁰ Nurses have emphasized that understanding an AI system's reasoning is essential for them to feel confident incorporating its suggestions into patient care. Another practical limitation is output verbosity in generative systems. Simulation work with ICU nurses shows that overly long, detail-heavy suggestions complicate information triage and distract from key clinical signals, underscoring the need for concise, nurse-tailored summaries and controls over level of detail.¹¹ Relatedly, potential biases in AI models (due to unrepresentative training data or flawed algorithms) raise ethical concerns, as AI could inadvertently perpetuate healthcare disparities or unsafe recommendations if not carefully monitored.¹² Implementation into clinical workflows is another challenge. As critical care is a fast-paced, unpredictable environment, and AI systems must integrate seamlessly with existing electronic health records (EHRs) and nursing routines to be truly useful. Many hospitals lack clear guidelines or standards for introducing AI into nursing practice, making it difficult to scale up successful pilot projects.¹³ Additionally, nurses worry about overreliance on AI—there is a concern that if clinicians begin to uncritically depend on algorithm outputs, it could erode their clinical skills and autonomy over time. Maintaining a balance where AI provides support without diminishing the central role of human clinical expertise is therefore critical.¹⁴ Finally, the evidence base for AI in ICU nursing, while rapidly growing, is still evolving. Many published studies are retrospective or proof-of-concept, with heterogeneous methods and endpoints, making it hard to draw definitive conclusions about real-world effectiveness.

In light of this background, the purpose of this review is to synthesize the recent literature on AI applications in critical care nursing and clarify the current state of knowledge in this domain. We specifically focus on ICU settings from a nursing perspective, examining how AI has been applied to support the work of critical care nurses and impact patient care. This review concentrates on the last five years (2020–2025) of published English-language literature, encompassing both original research studies and review articles relevant to AI in ICU nursing. By restricting to this recent time frame and including nursing-focused studies, we aim to capture the contemporary trends, innovations, and challenges that define the intersection of AI and critical care nursing. The review seeks to (1) summarize key areas of application and findings from recent studies, (2) identify prevailing themes or “hot spots” in research (such as common targets for AI such as patient monitoring or risk prediction), and (3) discuss the challenges, knowledge gaps, and implications for nursing practice and future research. Ultimately, this work is intended to provide critical care nurses, nurse leaders, and researchers with a clear overview of how AI is influencing ICU nursing today and guide efforts to harness AI effectively in this high-stakes field.

Methodology

Study design

This review employed a narrative synthesis approach to summarize and categorize original research articles focusing on AI applications in ICU nursing settings. Due to the heterogeneity in study designs, data types, AI technologies used, and outcome measures, a meta-analysis was not feasible. The review focused on empirical studies published in peer-reviewed English-language journals over the past five years.

Inclusion and exclusion criteria

Studies were included if they met the following criteria: (1) focused on the application of AI in ICU nursing practice or within clinical decision support systems that directly involve nurses; (2) comprised either original empirical research—such as primary data collection or secondary analysis of clinical datasets—or peer-reviewed review articles, including systematic reviews, scoping reviews, or narrative reviews; (3) provided a clear description of the AI methodology, including the type of algorithm used, model training, and validation approach, where applicable; and (4) reported measurable outcomes relevant to clinical performance, nursing processes, patient care, or nursing-related knowledge synthesis.

Data sources and search strategy

A comprehensive search was conducted across four major academic databases: PubMed, Web of Science, Scopus, and IEEE Xplore, to identify relevant literature on AI applications in ICU nursing. The following keywords were used in combination with Boolean operators: (“intensive care unit” OR “ICU”) AND (“nursing” OR “nurse”) AND (“artificial intelligence” OR “machine learning” OR “deep learning” OR “natural language processing” OR “predictive model” OR “generative AI” OR “large language model” OR “LLM”). The search covered studies published between 1 January 2020 and 30 October 2025. The search was limited to studies published in English. Both original empirical studies and review articles (including systematic reviews, scoping reviews, and narrative reviews) were considered eligible. Publications such as editorials, opinion pieces, conference abstracts, and nonpeer-reviewed materials were excluded to ensure the inclusion of scientifically rigorous and thematically relevant sources. An initial total of 4138 articles were retrieved.

Study selection and screening

The initial database search yielded a total of 4138 records. All retrieved articles were imported into EndNote reference management software, where duplicate entries were identified and removed using the software's de-duplication function. A manual verification step was then performed to ensure all duplicates, reviews, and clearly irrelevant records were excluded.

Following de-duplication, a two-stage screening process was conducted. In the first stage, titles and abstracts were independently screened by two reviewers to determine potential relevance based on the predefined eligibility criteria. Articles that were clearly unrelated to AI, ICU settings, or nursing practice were excluded at this stage. In the second stage, full-text reviews were carried out for the remaining records to assess whether they met all inclusion criteria. Studies were excluded during this phase for reasons including lack of nursing relevance, absence of AI applications, focus outside the ICU setting, or insufficient methodological detail. Discrepancies between reviewers were resolved through discussion or, when necessary, consultation with a third reviewer.

We conducted and report this review in accordance with PRISMA 2020. After completing the full screening process, 37 articles were included in the final synthesis, comprising original empirical studies of AI applications within ICU nursing practice. The identification, screening, eligibility, and inclusion steps are shown in the PRISMA flow diagram (Figure 1).

Figure 1.

PRISMA flow diagram illustrating the literature screening and selection process.

Quality appraisal

The 37 eligible papers were evaluated for quality using the Critical Appraisal Skills Program (CASP) Checklist for Qualitative Research as described by previous analysis.¹⁵ A prominent critical evaluation instrument for evaluating the overall quality of research results reports, particularly qualitative literature reviews, is CASP. Two reviewers independently assessed each study, with disagreements resolved through discussion or consultation with a third reviewer. The CASP scores ranged from 7 to 10 out of a possible 10 points, indicating generally good methodological quality across the included studies. Table 1 illustrates the results of the quality evaluation for each of the 37 papers.

Table 1.

Summary of quality assessment.

Criteria	No. of studies
Scores	Yes (= 1)		No (= 0)
A clear statement of the research aims	37		0
Appropriate methodology	36		1
Appropriate research design	35		2
Appropriate recruitment strategy	37		0
Appropriate data collection	37		0
The relationship between the researcher and participants described	35		2
Ethical issues considered	35		2
Sufficient data analysis	36		1
A clear statement of findings	34		3
The study is valuable	37		0
Total Score: 10	33 studies = 10	2 studies = 8	1 studies = 7	1 studies = 6

Results

Study characteristics

A total of 37 empirical studies on AI in ICU nursing were included, published between 2020 and 2025 (Table 2). The number of publications peaked in 2025, reflecting a heightened global interest in applying AI to critical care nursing in the wake of the COVID-19 pandemic. The research is international in scope, with notably strong representation from Asia and North America. By country, the United States contributed 17 studies; China 12 (with one additional study from Taiwan); South Korea 6; Italy 3; and the Netherlands 3. The United Kingdom and Thailand each contributed two studies, while Australia, Germany, Spain, Switzerland, New Zealand, and Turkey contributed one each. In total, 10 studies were multinational, including collaborations such as China–United States (n = 2), South Korea–United States (n = 2), Italy–Netherlands–United States (n = 2), Netherlands–United States (n = 1), United Kingdom–United States (n = 1), Switzerland–United States–New Zealand (n = 1), and South Korea–United States–Thailand (n = 1).

Table 2.

Summary of included studies on artificial intelligence applications in ICU nursing.

Clinical applications	Country	Study aim/objective	Study design	Clinical setting	Participants / users	AI type / techniques used	AI function/application	Data source	Outcomes / benefits reported	Implementation status	Challenges or limitations	Nursing-specific relevance	Reference
Early Warning Systems and Risk Prediction	USA	To develop a model predicting ICU transfer within 24 h among hospitalized COVID-19 patients.	Retrospective cohort study	Mount Sinai Hospital, general wards and ICU	1987 COVID-19 inpatients	Random Forest	ICU transfer risk prediction within 24h	Hospital EHR (vital signs, labs, ECGs, nursing assessments)	AUC = 0.799, Sensitivity = 72.8%, Specificity = 76.3%	Retrospective analysis	Pandemic-era data, generalizability to non-COVID settings unclear	Moderate—includes nursing assessments as input	¹⁶
	China (Taiwan)	To predict ICU transfer within 24 h in pediatric pneumonia patients using ML.	Retrospective single-center cohort	National Taiwan University Hospital	8464 pediatric patients	Random Forest, XGBoost, logistic regression	Predict ICU need within 24 h of hospitalization	Clinical EHR features within 24h	RF AUC = 0.99; high precision and recall	Model validated, not deployed	Single center; pediatric focus	High—helps nurses identify deterioration in pediatric pneumonia cases	¹⁷
	South Korea	To prospectively validate DeepCARS for predicting IHCA or unplanned ICU transfer (UIT) in general ward patients.	Prospective, multicenter cohort	4 teaching hospitals, South Korea	55,083 adult patients admitted to general wards	Deep learning-based risk prediction (DeepCARS), compared with MEWS, NEWS	Predict IHCA and UIT within 24 h of deterioration	Vital signs, lab values, early warning scores	DeepCARS AUROC = 0.869 vs. MEWS (0.756) and NEWS (0.767); fewer false alarms	Prospectively validated and dashboard deployed	Alerts not always acted on; needs integration into workflows	High—improves detection of deterioration using vital signs tracked by nurses	¹⁸
	China	To predict delirium in critically ill children 24 h after PICU admission using ML.	Prospective cohort study	57-bed PICU, large academic medical center	1576 critically ill children	XGBoost, Logistic Regression, others	Prediction of delirium 24 h after admission	Medical and nursing records, bedside nurse delirium screening	XGBoost AUC = 0.805, LR model also strong (AUC = 0.789)	Prospective validation complete	Lower performance in patients <24 months old	High—delirium scored by bedside nurses; model supports nursing interventions	¹⁹
	Switzerland, USA, NZ	To develop early warning systems (circEWS) for circulatory failure using ML in ICU patients.	Retrospective multicenter model development and external validation	Multidisciplinary ICU; HiRID (Switzerland), MIMIC-III (external)	36,098 ICU admissions	Gradient Boosted Trees (LightGBM), SHAP, ensemble learning	Early prediction of circulatory failure (within 8 h)	HiRID (240 patient-years, 3B data points), MIMIC-III for validation	AUROC = 0.94; predicted 90% of events, 82% > 2 h before onset	Externally validated; not deployed clinically	Data-intensive, requires frequent high-resolution monitoring	Moderate—supports early detection of circulatory instability	²⁰
	China	To develop ML models predicting mortality risk in ICU COVID-19 patients.	Retrospective model development and validation	Vulcan Hill Hospital ICU, Wuhan	123 ICU patients with confirmed COVID-19	XGBoost, Logistic Regression, SHAP, LIME	Prognosis and mortality prediction	Hospital EMR, 100 candidate features from labs, vitals, nursing care, etc.	AUC = 0.92 (validation); strong calibration and explainability	Model validated and deployed as web tool	Small sample size, early pandemic setting	Moderate—includes nursing features and may inform ICU care planning	²¹
	South Korea, USA, Thailand	To use unsupervised learning to identify ICU patient subgroups and compare clinical features and outcomes.	Exploratory, unsupervised clustering analysis	ICU of academic medical center, USA	1503 ICU encounters	K-means clustering	Subgroup discovery based on lab patterns	ICU EMR (labs: BUN, creatinine, WBC, RBC, etc.)	3 patient clusters with distinct mortality and treatment profiles	Exploratory phase; no real-time deployment	Only 9 lab features; retrospective design	Moderate—informs resource use and subgroup nursing planning	²²
	USA, UK	To predict AKI in pediatric ICU patients up to 48 h before KDIGO diagnosis using ML.	Multicenter retrospective cohort with prospective-ready design	PICU/CTICU across 3 hospitals (USA, UK)	16,863 pediatric ICU patients (1 month–21 years)	Ensemble ML, time-series features, explainable AI	Prediction of moderate–severe AKI (Stage 2/3) within 48 h	EHR from 3 hospitals; creatinine trends, demographics, meds	AUC = 0.89; 30 h median lead time; 47% PPV; actionable alerts	Validated; may support clinical decision-making	Need for real-time infrastructure; generalization not proven	High—enables early prevention through nursing action (fluids, nephrotoxin avoidance)	²³
	China	To develop a ML model predicting VAP 24 h before diagnosis in ICU patients on mechanical ventilation.	Retrospective cohort (MIMIC-III), model comparison with CPIS	ICU patients from MIMIC-III with mechanical ventilation >24h	10,431 patients	Random Forest	Early detection of ventilator-associated pneumonia	MIMIC-III: 42 features (demographics, APACHE, labs, procedures)	AUC = 0.84; sensitivity 74%, specificity 71%; outperformed CPIS	Internal validation only	No external validation; requires structured ICU data input	High—potential for VAP risk alerts and early nursing interventions	²⁴
	USA	To develop and validate static and dynamic ML models for predicting ICU delirium onset using EHR data.	Retrospective multicenter model development and external validation	ICU; MIMIC-III, MIMIC-IV, eICU databases	Over 60,000 ICU admissions across multiple datasets	Static and dynamic ML models; AUC and calibration metrics	Predict onset of ICU delirium up to 12 h in advance	EHR from multiple public ICU datasets (MIMIC, eICU)	Dynamic model AUC = 0.845; accurate early prediction of delirium	Externally validated; open-source code available	Generalizability across settings not fully confirmed	Moderate—supports early nurse-led delirium interventions	²⁵
	China	To predict AKI risk in critical care patients with acute cerebrovascular disease using ML.	Retrospective dual-cohort analysis (internal + external)	ICU; MIMIC-III and Chinese hospital database	3434 ICU patients with cerebrovascular disease	XGBoost, RF, LR, others	Risk prediction of AKI	Labs, demographics, vital signs from EHR	Best model AUC = 0.880; accurate risk stratification	Validated on external cohort	AKI definition differences, dataset bias	Moderate—relevant for renal nursing monitoring	²⁶
	China	To construct interpretable ML models for predicting VTE in critically ill patients.	Retrospective cohort using eICU database	Multicenter ICU (207 centers)	109,044 ICU patients	Random Forest, SVM, XGBoost; DALEX for interpretability	Prediction of ICU-acquired VTE	eICU Collaborative Research Database v2.0	RF AUC = 0.9378; high precision and balanced accuracy	Model internally validated	No clinical deployment; large-scale data curation	High—relevant for thrombosis prevention nursing in ICU	²⁷
	Spain	To develop ML models predicting extubation success or failure in ICU patients receiving invasive mechanical ventilation.	Retrospective single-center cohort study	30-bed polyvalent ICU, Spain	Adult ICU patients (2015–2019) with ≥12 h invasive mechanical ventilation	Support Vector Machine, Gradient Boosting, Logistic Discriminant Analysis	Prediction of extubation outcome	Clinical Information System (GE Centricity), including monitor, respiratory logs, demographics	SVM achieved 94.6% accuracy in predicting extubation success; outperformed SBT	Model development phase only	Single center, no external validation, retrospective data	Indirect—supports respiratory care decisions in ICU	²⁸
	China	To develop interpretable RNN models (LSTM, GRU) for dynamic prediction of extubation failure risk.	Retrospective MIMIC-IV cohort	ICU, invasive mechanical ventilation patients	8599 patients; 30.3% had EF	LSTM, GRU, SHAP; compared with RF, XGB, SVM, etc.	Predict extubation failure (reintubation/death within 48 h)	MIMIC-IV time series; 4-h windowed features	AUROC = 0.828 (LSTM); interpretable SHAP analysis	Retrospective; not yet integrated into care	No real-time testing; high feature engineering effort	High—supports extubation readiness decision and risk alerting	²⁹
	Republic of Korea	To create a real-time delirium prediction model for ICU patients using continuous physiological waveforms plus demographics, with internal, temporal, and external validation.	Retrospective development; internal and temporal validation at one center; external validation at a second hospital; CAM-ICU reliability checked prospectively.	ICU (adult).	Development: 5478 records (651 pts) + temporal set 4438 records; external validation: 670 patients.	Random Forest on ECG/PPG/respiratory signals	ICU delirium prediction during stay (near-real-time updates).	ICU monitoring waveforms + EHR; CAM-ICU labels (κ = 0.81 nurse vs research verification).	AUROC 0.82 (internal), 0.73 (temporal), 0.84 (external); AUPRC 0.62/0.85/0.77; positive net benefit across thresholds; rising scores approaching diagnosis time.	Development + external validation (no live deployment).	Sensor noise and device interoperability; integration workload.	High—focuses on nurses’ documentation and clinical judgment (supports continuous surveillance and bundle activation).	³⁰
	Multinational (US/Italy/NL)	To develop and externally validate a model for persistent AKI stage-3 ≥ 72 h during ICU stay across multiple international datasets.	Retrospective, multicohort development + external validation; explainable tree regressor; calibrated pipeline.	ICU (adult).	7759 ICU patients with KDIGO AKI 2–3; development on single-center cohorts; external on multicenter (eICU, GiViTI).	Explainable tree-based model; trends + calibration	Real-time risk of persistent AKI-3 (≥72 h or death/RRT composite).	MIMIC-III, AmsterdamUMCdb (dev/val); external: eICU-CRD, GiViTI MargheritaTre.	External AUROC 0.94 (US) / 0.85 (Italy); shows cross-system transportability; potential to guide early nephrology interventions.	Development + external validation (no live deployment).	Endpoint harmonization across databases; site heterogeneity.	Not directly focused on nurses, but relevant for ICU care planning and workload management (fluids, nephrotoxin review, RRT readiness).	³¹
	Republic of Korea (dataset: USA)	To build an ensemble ML model that predicts weaning within 14 days using data captured before or ≤24 h after intubation.	Retrospective single-database development with 5-fold CV; comparator to SAPS-II/SOFA.	ICU (adult).	23,242 IMV patients from MIMIC-IV; 19,025 (81.9%) successfully weaned ≤14 d.	Ensemble: CatBoost / RF / regularized LR; SHAP	Weaning success ≤14 days (pre-/early-intubation window).	MIMIC-IV (2008–2019).	AUC 0.861 (ensemble), superior to SAPS-II (0.749) and SOFA (0.588); lactate and anion gap among top features.	Development + internal CV only.	Single-center DB; no external validation; outcome definition nuances.	Moderate—includes nursing assessments as input (early variables align with nursing assessments/triage for ventilatory care).	³²
	Republic of Korea + USA (external)	To develop iREAD, an explainable ensemble model predicting ICU readmission ≤48 h at the moment of ICU discharge and compare with traditional scores.	Retrospective multicenter development with external validation (MIMIC-III, eICU-CRD); survival analysis by risk strata.	ICU (adult, medical & surgical; discharge-time assessment).	SNUH dev cohort 70,842; external: MIMIC-III 43,237 and eICU-CRD 90,271 admissions.	Ensemble on 30 routine variables; interpretable	ICU readmission risk ≤48 h (also late and overall).	SNUH EHR (dev); MIMIC-III, eICU-CRD (external).	Internal AUROCs 0.771 (≤48 h) / 0.834 (>48 h) / 0.820 (overall); external AUROCs 0.768 (MIMIC-III) & 0.725 (eICU-CRD); outperformed SWIFT/other baselines.	Development + external validation; not yet prospectively deployed.	External performance drop; needs workflow integration and impact study.	Indirect—relevant for discharge planning and ICU resource management (step-down readiness, monitoring orders).	³³
	USA	To develop and validate machine learning models predicting ICU mortality and length of stay using patient vital signs.	Retrospective cohort study using MIMIC database	Adult ICU, MIMIC-III	Adult ICU patients (MIMIC-III)	Random Forest, Logistic Regression, others (6 classifiers)	Prediction of ICU mortality and length of stay	MIMIC-III (v1.4) database, vital signs and demographics	Best mortality prediction accuracy ∼89% (RF); LOS prediction ∼65% (RF)	Model development and validation stage only	No integration with live clinical systems, only retrospective data	Not directly focused on nurses, but relevant for ICU care planning and workload management	³⁴
	China	To predict prolonged ICU and hospital stays in spinal cord injury patients using ML classifiers.	Retrospective cohort study using eICU and MIMIC databases	ICU for spinal cord injury patients	1599 critical SCI patients	91 ML classifiers; final ensemble of top 3 models	Prediction of prolonged ICU and hospital stay	eICU and MIMIC databases	AUC for prolonged ICU stay prediction = 0.864 (CV), 0.802 (test set)	Algorithm development and validation	Need for generalization beyond SCI-specific context	Indirect—relevant for discharge planning and ICU resource management	³⁵
	China	To develop a ML-based nomogram predicting ICU pressure injury risk using EMR data.	Retrospective cohort study	ICU, 60-bed unit	618 ICU patients	Logistic Regression, Decision Tree, Random Forest	Risk prediction of pressure injury in ICU patients	ICU EMR: lab results, Braden scale, comorbidities	Combined ML + Braden model AUC = 0.87 (train), 0.84 (test)	Model development; not yet deployed	Single-center, retrospective; limited real-time application	High—focuses on nursing outcomes and documentation	³⁶
	Australia	To develop an early model at ED arrival/early encounter that predicts direct ICU admission and estimates automated referral volume.	Retrospective cohort; model development with train/validation/test splits.	Emergency department (pre-ICU).	484,094 adult ED presentations; 3955 (0.82%) direct ICU admissions.	Gradient-boosted trees on EMR; optional NLP	ICU admission/transfer prediction to support early escalation.	Health-system ED EMR.	AUROC ≈ 0.92; with ≥75% risk threshold, estimated 2.7 automated ICU referral triggers/day; strong drivers: triage category, ambulance arrival, qSOFA, baseline HR, prior-year admissions.	Development + internal testing/validation (no live deployment).	Severe class imbalance; site transportability and threshold calibration for alert fatigue.	Indirect—relevant for discharge planning and ICU resource management (early bed/staff preparation).	³⁷
	Türkiye (hospital registry)	To build ML models for ICU admission and ICU length of stay in hospitalized COVID-19, incorporating CT severity scores.	Retrospective registry analysis; model development with internal validation.	Inpatient wards / hospital-wide (pre-ICU endpoints).	6854 COVID-19 hospitalizations reviewed; 815 used for modeling (185 ICU admissions); synthetic minority oversampling for imbalance.	kNN (best) + Boruta; SMOTE	ICU admission risk and ICU-LOS estimation.	Hospital registry (clinical + labs ± CT-SS).	Acceptable discrimination for ICU admission; identification of key prognostic factors; LOS modeled with correlation/MAE/RMSE metrics.	Development + internal validation only.	Pandemic-era bias; single-region data; augmented positives via SMOTE.	Indirect—relevant for discharge planning and ICU resource management (anticipates ICU bed/vent needs).	³⁸
	USA/China (MIMIC-IV v2.2)	To build an interpretable ML model for early ICU delirium prediction in older adults (≥65 years) and explain key drivers.	Retrospective observational cohort; derivation/validation in MIMIC-IV v2.2; TRIPOD adherence.	ICU (adult; geriatrics focus).	9748 ICU patients ≥65 y; 26 features selected.	XGBoost (best); SHAP explainability	Delirium risk stratification early after ICU admission.	MIMIC-IV v2.2 ICU EHR.	Validation AUC 0.810; best training AUC 0.836; SHAP top features: GCS, mechanical ventilation, sedation; good calibration and decision-curve net benefit.	Development + internal validation.	Single database; no prospective impact; geriatrics only.	High—focuses on nurses’ documentation and clinical judgment (targets surveillance, sedation review, nonpharm prevention).	³⁹
	China & USA (multi-database)	To develop and externally validate across four healthcare systems interpretable ML models for in-hospital mortality in ICU pneumonia, identifying consistent high-value predictors.	Retrospective multicenter development with multidatabase external validation and SHAP interpretability.	ICU (adult pneumonia).	25,783 ICU pneumonia patients total: train on MIMIC-IV (9410); external on MIMIC-III (2487), eICU (13,541), and prospective FAHZU cohort (345).	XGBoost (best); Boruta + SHAP	ICU/hospital mortality risk stratification for pneumonia.	MIMIC-IV (train), MIMIC-III & eICU & FAHZU (external).	XGBoost training AUC 0.747; external AUCs 0.670–0.695; top features consistent (platelets, BUN, age, vital signs).	Model development + multidatabase external validation; not deployed.	Moderate AUCs externally; heterogeneity across systems; ICD-based cohorting.	Not directly focused on nurses, but relevant for ICU care planning and workload management (surveillance intensity, staffing).	⁴⁰
	China (MIMIC-IV v2.2)	To build an explainable XGBoost + SHAP model for pressure injury prediction among mechanically ventilated ICU patients and identify key risk factors.	Retrospective development with internal split validation.	ICU (adult, invasive mechanical ventilation).	29,448 MV ICU patients; 2052 had pressure injuries; 70/30 train-validation split.	Tuned XGBoost; SHAP factor ranking	Pressure injury risk during ICU stay.	MIMIC-IV v2.2.	Train AUC 0.797; internal validation AUC 0.739; top factors: sepsis, age, platelets, ICU LOS, P/F ratio, Hb, albumin, renal disease.	Model development + internal validation; not yet live.	Single database; stage distribution skew; need prospective impact evaluation.	High—focuses on nurses’ documentation and clinical judgment (turning schedules, device padding, early prevention bundle).	⁴¹
	Republic of Korea + international datasets	To create iMORS, a real-time 24-h mortality predictor using minimal common variables and validate internationally.	Retrospective development with temporal internal validation and external validation across three open ICU datasets.	ICU (adult).	SNUH internal cohort (multi-ICUs); external: MIMIC, eICU-CRD, AmsterdamUMCdb.	Ensemble DL + LightGBM (minimal set)	Short-term (24 h) mortality risk during ICU stay.	SNUH EHR; external open ICU databases.	Internal AUROC 0.964; external AUROCs 0.890 (MIMIC), 0.886 (eICU-CRD), 0.870 (AmsterdamUMCdb); all > NEWS.	Development + external validation; real-time capable but not yet live.	Harmonizing variables across systems; transportability considerations.	Not directly focused on nurses, but relevant for ICU care planning and workload management (hourly surveillance prioritization).	⁴²
	USA (eICU → MIMIC-IV external)	To develop an explainable pseudo-dynamic ML framework (XMI-ICU) that predicts ICU mortality in MI 6–24 h ahead with time-resolved SHAP explanations.	Retrospective two-cohort study with held-out test and external validation.	ICU (adult myocardial infarction).	eICU development/test; external validation on MIMIC-IV.	Sliding-window XGBoost; time-resolved SHAP	6/12/18/24-h ahead ICU mortality prediction.	eICU (held-out test) → MIMIC-IV (external).	AUROC 0.920 for 6-h horizon; robust across horizons; external validation successful; decision-curve analysis > APACHE-IV.	Development + external validation; not deployed.	Horizon-dependent features; MI-specific cohort; generalizability.	Not directly focused on nurses, but relevant for ICU care planning and workload management (cardiac ICU escalation/monitoring).	⁴³
	Netherlands (dev) → USA multi-center (transport)	To evaluate multimodal CNNs (clinical variables + notes + concept embeddings) for early ICU AKI prediction and test transport using variables-only model.	Retrospective development with 5 × 2 CV; variables-only branch externally transported to eICU.	ICU (adult).	MIMIC-III 44,303 stays (dev); eICU 142,253 stays (transport).	Multimodal CNN (variables + notes + concepts)	AKI during ICU stay (after first 48 h).	MIMIC-III clinical vars + nursing/physician notes; eICU for external transport (vars-only).	Internal AUROC 0.73–0.90; variables-only transported 0.68–0.77 to eICU; showed value of text/concepts when labs (SCr/UO) excluded.	Development + internal CV; external transport (vars-only).	Note quality/noise; incorporation bias if using SCr/urine; domain shift.	Moderate—includes nursing assessments as input (notes pipelines include nursing documentation).	⁴⁴
	USA (two children's hospitals)	To create an expert-augmented ML (EAML) model that predicts successful extubation in PICU and improves generalizability by embedding clinician knowledge into model rules.	Two-site EHR study; internal split at Site A and external test at Site B; experts reviewed RuleFit rules; TRIPOD-AI reporting.	PICU (pediatric ICU).	Intubated children >30 days and <18 years; Site A for train/test; Site B for external test; 25 PICU clinicians surveyed for rule vetting.	RuleFit → expert-augmented ML (EAML)	Extubation readiness (success = no reintubation ≤48 h).	Two institutional PICU EHRs (Epic); 98 variables aggregated in 4-h windows.	Internal AUC: RuleFit 0.817 vs EAML 0.814 (ns); external AUC improved with EAML 0.799 vs 0.791; demonstrates better transportability.	Model developed and externally tested (not yet deployed live).	Expert survey burden; single health-system; pediatric specificity.	High—focuses on nurses’ documentation and clinical judgment (supports daily extubation huddles and readiness checks).	⁴⁵
Nursing Decision Support	USA (two ICUs in one system)	To predict next-day extubation using overnight (00:00–08:00) EHR features and compare across model families with external testing.	Single-center prospective registry (MICU) for train/val/test + external test at community ICU; multiple encodings and algorithms.	ICU (adult, MICU).	Internal 448 pts / 3095 ICU-days; external 333 pts / 2835 ICU-days.	LSTM (best) for next-day extubation	Extubation readiness—next calendar day.	Health-system ICU EHR; curated/validated ventilation events.	AUC 0.870 internal and 0.870 external; 63.8% of model-predicted extubations occurred within 3 days of actual extubation.	Development + external testing (not yet live).	Generalizability beyond MICU; adherence to SAT/SBT protocols.	High—focuses on nurses’ documentation and clinical judgment (supports nurse-led SAT/SBT planning and bed management).	⁴⁶
	Italy (dev US) → Netherlands external	To deliver real-time IMV weaning readiness predictions at 24/48/72 h and test across datasets/subgroups.	Retrospective development + external validation with sensitivity analyses (age, admission reason, neuro).	ICU (adult).	8565 MIMIC-IV (dev/val) and 2626 AmsterdamUMCdb (external).	XGBoost; SHAP; bedside-available variables	Weaning readiness in next 24/48/72 h.	MIMIC-IV (dev/internal); AmsterdamUMCdb (external).	External AUROC 0.847/0.795/0.789 (24/48/72 h); sensitivity >0.75; specificity decreases as horizon lengthens.	Development + external validation (not yet live).	Lower performance in elderly and neurosurgical cohorts; dataset harmonization.	High—focuses on nurses’ documentation and clinical judgment (timing SBT/SAT, step-down coordination).	⁴⁷
	Italy	To rank individual nursing diagnoses (NDs) by their predictive relevance for ICU transfer in adult and pediatric patients.	Retrospective, monocentric observational study	Tertiary hospital with ICU; adult and pediatric wards	42,735 hospitalized patients; nurses using EHRs	Random Forest	Predictive ranking of nursing diagnoses associated with ICU transfer	EHR (PAI, PAIped, HDR), Italy	Identified high-risk NDs (e.g., acute pain, airway clearance impairment) for ICU transfer	Retrospective model development	Data completeness, variation in ND use across institutions	High—focuses on nurses’ documentation and clinical judgment	⁴⁸
	Thailand	To compare ML against standard triage for predicting ICU admission at triage using structured data plus free-text chief complaints.	Retrospective single-center development with internal validation.	Emergency department (pre-ICU).	163,452 adult ED visits.	XGBoost + triage-text embeddings	ICU admission prediction (disposition-time horizon).	ED EHR + triage text (nurse-entered).	AUROC 0.917 and AUPRC 0.629, outperforming CTAS (0.882/0.333); key predictors: arrival mode, age, chief-complaint embeddings.	Development + internal validation (TRIPOD-AI reporting).	Single-site; language/text variability; prospective/ external validation pending.	Indirect—relevant for discharge planning and ICU resource management (faster escalation/notification).	⁴⁹
Workload and Documentation Support	USA	Align a lightweight LLM to ICU heart-failure nursing documentation quality via DPO to reduce documentation burden and improve note quality.	Preprint; retrospective model training & expert evaluation	ICU (critical care), nursing documentation	8838 HF nursing notes; 21,210 preference pairs	DPO fine-tuning of Mistral-7B; BLEU/ROUGE/BERTScore + expert review	Documentation quality alignment; automated checks	MIMIC-III nursing notes	↑BLEU 84%（0.173→0.318），↑BERTScore 7.6%	Research stage; locally deployable model suggested	Generalizability; reliance on MIMIC-III; preference data creation	High—directly targets nursing notes quality & burden in ICU	⁵⁰
	UK (Scotland)	Forecast future ICU bed availability (1/7/14 days) using bed-management data (no patient-level data).	Single-center feasibility; retrospective analysis of prospectively collected ops data	PICU within tertiary hospital; hospital-wide bed board data	Hospital departments’ bed usage snapshots (2012–2019)	Regression & classification; interpretable models	Resource/bed capacity forecasting	Routine bed management data (arrivals/transfers/discharges)	AUC 0.78 (1-day availability); MAE: 1.33 (1 day) vs 1.61 (14 days)	Feasibility shown; future multicenter work needed	Data quality; single-center; temporal resolution changes	High—supports staffing & elective scheduling; reduces cancellation risk	⁵¹
	Germany	Predict ICU bed occupancy to enable integrated OR scheduling (tactical planning).	Real-world retrospective cohort; 7-year dataset	Tertiary hospital; OR + downstream ICU	∼77k surgical patients, 7 years	Neural networks (vs baselines)	Capacity/bed-impact prediction for OR planning	Electronic hospital records (paths/LOS etc.)	Outperformed state-of-art in predicted beds（+43%）；ICU demand ↓8.9% with optimized schedule	Tooling & optimization proof-of-concept	Site-specific tuning; memory-depth parameterization	High—informs staffing/bed leveling affecting ICU nursing workload	⁵²

In terms of study design, the majority of the included studies were quantitative observational analyses. Using overlapping tags, 33 studies employed retrospective cohort or case–control designs, utilizing existing critical care databases or EHR data. These investigations commonly leveraged large publicly available ICU datasets such as MIMIC-III/IV (n = 18) or eICU (n = 9) for model development, validation, and performance benchmarking. A smaller group of eight studies adopted prospective approaches, including cohort validations and implementation trials, to evaluate AI models in real-world or near-real-time settings.

Regarding clinical setting, population focus varied. A total of 32 studies targeted adult ICUs or covered mixed populations. In contrast, only five studies addressed pediatric critical care (PICU), typically aiming to predict complications or clinical deterioration in critically ill children.^{17,19,23,45,51} No studies specifically focused on neonatal ICUs (NICUs), highlighting a clear gap in AI research for neonatal critical care nursing.

Artificial intelligence methods used

Across the studies, a variety of AI and machine learning (ML) techniques were employed, with a clear dominance of data-driven modeling approaches. Traditional ML algorithms were widely used, often for predictive modeling tasks using structured clinical data such as vital signs, lab values, and nursing assessments. In particular, Gradient boosting methods (e.g., XGBoost/GBM/LightGBM/CatBoost) appeared in 15 studies.¹⁷^19–21^{,26–28,32,39–43,45,49} Random forest was used in 14 studies.^{16,17,24,26,27}^30–33^{,35,37,40,43,50} Logistic regression appeared in 7 studies,^{17,19,21,26,32,34,36} typically serving as an interpretable baseline for comparison. Other methods were less common: support vector machines in 3,^27–29 neural networks/DL in 4,^18,29,44,46 decision trees in 1,³⁶ and k-nearest neighbors in 1.³⁸ Several papers benchmarked multiple models in parallel, and tree-based ensembles (gradient boosting and random forest) were most frequently reported as top performers for structured ICU data.

Among the four deep-learning (DL) studies, authors used architectures that are well-suited to time-series and high-dimensional ICU data. Two studies built recurrent neural network/long short-term memory models to capture temporal evolution of physiologic signals and ventilatory status for sequential risk prediction—respectively, estimating extubation failure and next-day extubation readiness in critically ill patients.^29,46 One study developed a multimodal convolutional neural network (CNN) framework that fused structured clinical variables with other data sources to predict acute kidney injury (AKI) in the ICU, illustrating DL's capacity for feature learning across heterogeneous inputs.⁴⁴ Another study reported a prospective, multicenter evaluation of a DL-based early-warning system for clinical deterioration (e.g., cardiac arrest/ICU transfer) in real-world wards, directly informing ICU escalation pathways and nursing workflows.¹⁸

Applications of natural language processing (NLP) to unstructured clinical text were uncommon but present in four studies. These works combined narrative notes with structured EHR variables to support prediction and triage. For example, studies have been analyzed it for incorporating triage free-text or nurses’ narratives alongside vitals and labs for acuity/deterioration risk.^37,49 Others used multimodal pipelines that fused text with tabular inputs, including a CNN-based framework for ICU AKI risk,⁴⁴ and a language modela languagejury ICU pipelines that fused text with tabular inputs, including a CNNt for incorp.⁵⁰ By contrast, the majority of studies relied primarily on structured EHR data, underscoring that text-based ML is emerging but not yet mainstream in ICU nursing research.

Clinical applications

Early warning systems and risk prediction

A total of 30 studies focused on prediction models designed to provide early warnings of patient deterioration or other critical events such as ICU transfer/admission, mortality, delirium, extubation/weaning outcomes, and resource-related endpoints (length of ICU stay or ICU readmission). This represents the most prominent area of AI application in ICU nursing to date. These models predominantly utilized structured clinical data—including vital signs, laboratory results, demographic information, and nurse-entered documentation—to identify high-risk patients and support timely clinical decision-making.

Intensive care unit admission or transfer prediction was a common target. Several studies applied ML models—particularly random forests and logistic regression—to identify patients at risk of clinical deterioration requiring ICU transfer. For instance, Cheng et al.¹⁶ developed a random forest model using routinely collected EHR data from COVID-19 inpatients and achieved an AUC of 0.799 for predicting ICU transfer within 24 h, demonstrating the feasibility of such tools in real-time triage. In the emergency department (ED) setting, Pandey et al.³⁷ developed ML models to predict ICU admission directly from ED presentations, integrating triage and early clinical information to assist escalation decisions at the front door. Beyond routine tabular data, Zakariaee et al.³⁸ incorporated chest CT severity scores alongside clinical variables to model ICU admission (and length-of-stay [LOS]) in COVID-19 cohorts, highlighting the value of multimodal inputs for early risk stratification. In addition, a prospective, multicenter evaluation of a DL-based early-warning system demonstrated utility for detecting imminent deterioration with downstream ICU implications in real-world wards,¹⁸ underscoring the breadth of approaches used to anticipate ICU-level care needs.

Another major application was mortality prediction. Several studies developed and validated AI models to estimate ICU or in-hospital mortality, often comparing their performance to traditional scoring systems such as APACHE, SOFA, or MEWS. For example, Pan et al.²¹ used XGBoost and logistic regression to predict mortality in COVID-19 ICU patients and achieved an AUC of 0.92, outperforming conventional risk scores. Similarly, Alghatani et al.³⁴ developed multiple ML classifiers, with the random forest model reaching an accuracy of ∼89% for ICU mortality prediction based on MIMIC-III data. In addition to predictive performance, some tools (e.g., SHAP or LIME) also provided interpretable outputs, highlighting risk factors like lactate, oxygenation, and comorbidities—thus aiding nurses in clinical prioritization and end-of-life planning.

Several studies aimed for ICU readmission and LOS prediction. Some studies targeted operational endpoints central to capacity planning and discharge coordination. Most models drew on structured EHR variables (vital signs, labs, demographics, coded nursing data), sometimes augmented with imaging-derived severity scores. For ICU readmission, Lim et al.³³ developed and multicenter-validated a model to flag patients at risk of readmission within 48 h after ICU discharge, supporting safer step-down and follow-up planning. For LOS, Alghatani et al.³⁴ trained classifiers to estimate ICU LOS alongside mortality using routine EHR features; Fan et al.³⁵ identified prolonged ICU stay among patients with spinal cord injury using perioperative and clinical data; and Zakariaee et al.³⁸ modeled ICU admission and LOS in COVID-19 by combining clinical variables with chest CT severity scores. Collectively, these studies illustrate how ML can underpin operational decision-making in critical care—from discharge timing to bed management.

Nursing decision support

A total of four studies explored the use of AI as a decision aid for nursing-specific tasks. These tools were designed to support clinical judgment, triage assessment, documentation, and care planning—areas where timely and standardized decisions are essential for patient safety in ICU settings.

In the ventilation weaning/extubation domain, Fenske et al.⁴⁶ developed models to predict next-day extubation, enabling nurses to plan shift-to-shift readiness checks and coordinate team huddles; Zappalà et al.⁴⁷ proposed a real-time weaning-readiness predictor for invasively ventilated patients, offering moment-to-moment guidance that can structure nurse-led assessments. For nursing diagnoses and escalation cues, Cesare et al.⁴⁸ used random forest models to rank standardized nursing diagnoses by their predictive relevance for ICU transfer risk across adult and pediatric cohorts, providing data-driven prioritization signals for care plans. Triage decision support was addressed by Sitthiprawiat et al.,⁴⁹ who integrated nurse-captured triage assessments with structured EHR variables to identify patients at risk of critical outcomes (including ICU admission) and benchmarked the tool against conventional triage rules.

Workload and documentation support

A total of four studies focused on applying AI to support nursing workload management and clinical documentation, areas that directly influence ICU nursing efficiency and staff well-being.^50–52 Unlike prediction models targeting patient outcomes, these studies centered on optimizing nurse-centered processes such as staffing, resource allocation, and structured record-keeping.

One study by Fan et al.⁵⁰ aligned language model–based methods with critical-care nursing documentation, illustrating how narrative nursing text can be structured and surfaced for downstream decision support without disrupting established charting practices. Moving from documentation to capacity planning, Palmer et al.⁵¹ demonstrated the feasibility of forecasting future critical-care bed availability using routine bed-management data, an approach that can inform staffing and resource allocation at the unit level. Complementing this, Schiele et al.⁵² developed neural-network models to predict ICU bed occupancy in support of integrated operating-room scheduling, highlighting how data-driven forecasts can bridge perioperative planning with ICU capacity.

Although these applications are still in early phases, they represent an emerging direction for AI in critical care—one that focuses not just on clinical outcomes but also on improving nurse workflow, documentation accuracy, and administrative efficiency. If successfully integrated into clinical systems, such tools have the potential to reduce clerical burden and free up more time for direct patient care.

Discussion

This narrative review reveals that the application of AI in ICU nursing is an emerging and rapidly evolving field, with most empirical studies published in the last five years. The focus has predominantly been on developing and validating predictive models that leverage large datasets of patient information (vital signs, assessments, EHR data) to assist with clinical predictions or decision support. Key areas of development include early warning scores for patient deterioration, predictive analytics for complications such as delirium or AKI, and decision support systems for nursing tasks such as triage prioritization and care planning. Collectively, the studies demonstrate that AI techniques—from traditional ML to DL—can achieve impressive accuracy in retrospective analyses. There is also a clear trend toward incorporating more sophisticated algorithms. These trends indicate a maturation of the research from purely technical proof-of-concept models toward more context-aware tools that consider usability in the nursing environment.

It is worth noting that implementation considerations for PICUs differ meaningfully from adult ICUs. Children have age-dependent physiologic norms—heart rate, respiratory rate and blood pressure vary widely by age—so alert thresholds, feature sets and model calibration cannot be directly transferred from adult models. Authoritative guidance provides age-specific acceptable ranges for unwell children and emphasizes that pattern of change matter as much as static cut-points, reinforcing the need for pediatric-specific tuning of ML systems. National PEWS programs also embed age-stratified observation charts and thresholds, underlining that pediatric early-warning differs from adult EWS by design.^53,54 In evaluation, outcomes also differ. For early-warning systems outside the PICU, mortality is rare, and composite deterioration endpoints (e.g., unplanned PICU admission, urgent interventions) are more appropriate than mortality alone, implicating pediatric-appropriate target selection for model training and validation.⁵⁵ Empirically, PICU-focused ML studies illustrate these distinctions. Pediatric AKI can be predicted 24–48 h earlier than guideline thresholds using physiologic time-series, highlighting distinct pediatric phenotypes and the value of sequential data.²³ Pediatric delirium risk models built on PICU cohorts similarly required pediatric variables and workflows, with tools intended for bedside nursing use.⁵⁶ Extubation planning in PICU has leveraged expert-augmented ML to encode clinician rules alongside data, reflecting pediatric-specific practice patterns and decision thresholds.⁴⁵ Finally, staffing, sedation, and care processes differ (e.g., pediatric-specific sedation protocols; workforce mix, and staffing ratios), which can influence data quality, label definitions, and alert acceptabilitydata, reflectidesign with PICU nurses and local recalibration are critical steps before deployment.⁵⁷ Future pediatric work should report age-appropriate calibration and use pediatric-suitable endpoints (e.g., composite clinical deterioration events), incorporate multicenter PICU cohorts, and involve bedside nurses in threshold setting and usability testing prior to EHR integration.

A notable strength in the current body of evidence is the international and interdisciplinary nature of the work. Studies from multiple countries have tackled similar problems, lending a global perspective on ICU nursing challenges that AI can address. Many researchers capitalized on open ICU databases (MIMIC, eICU, etc.) and multicenter cohorts, which increases the sample size and diversity of data used to train models. The result has been robust model performance in many cases, as well as publicly available algorithms or code in a few instances, which can accelerate collective progress. Another strength is the early attention to explainability and user-centered design in some studies—for instance, providing triage nurses with explanations for AI risk scores or designing AI systems explicitly to fit into nursing workflows.^58,59 This indicates that some investigators appreciate that an accurate algorithm alone is not enough but it must be interpretable and actionable for frontline nurses. Additionally, a few prospective studies and pilot implementations have been conducted, which is a critical step forward from purely retrospective research. The example of an AI-assisted triage intervention that successfully reduced mistriage in a live ED setting is an encouraging sign that these technologies can deliver real-world improvements when thoughtfully deployed.

Despite promising results, our review underscores significant limitations in the current evidence. Foremost, the level of clinical validation and implementation is limited. The vast majority of studies stopped at model development or retrospective validation stages. Only a couple of prospective trials were identified, meaning there is scant high-level evidence for actual patient outcomes or workflow improvements resulting from AI in ICU nursing. As a recent systematic review noted, the heterogeneity of study designs and lack of rigorous trials make it difficult to draw definitive conclusions about effectiveness.⁷ Future research needs to move beyond accuracy metrics and assess impact on clinical outcomes, nursing efficiency, and safety in real settings. Another limitation is that many models are context-specific and may not generalize well. For example, models trained on single-center or single-country data (or on a narrow patient group such as spinal cord injury ICU patients) may perform poorly elsewhere due to differences in patient populations, clinical practices, or data recording. Indeed, several studies themselves cite generalizability as a concern and often did not externally validate their algorithms. Data quality and completeness issues were also common challenges—for example, models relying on nursing documentation noted variability or missing data in those inputs.

Artificial intelligence directions beyond tabular prediction that are directly relevant to ICU nursing. Like previously mentioned, NLP is being used to leverage nursing narratives and triage free text for risk assessment and workflow support, while multimodal/computer visionoadjacent approaches fuse structured variables with image-derived signals (e.g., CT severity indices; CNN-based frameworks) to enhance early recognition of organ dysfunction. Also, AI in nursing education and simulation is gathering evidence for virtual simulation/adaptive tools that improve learning outcomes and support rehearsal of escalation pathways, suggesting a parallel route to uplift ICU nursing skills.⁶⁰ And nurse-assistive robotics (cobots) is an emerging strand for logistics and repetitive tasks, it has been supported to emphasize the need for codesign with nurses and implementation studies before routine use in high-acuity settings.⁶¹

To translate these strands into safe routine practice, implementation should follow contemporary reporting/appraisal standards that make external validation and calibration a requirement for transportability (e.g., TRIPOD + AI and PROBAST + AI explicitly extend earlier guidance to ML methods and emphasize transparent reporting, validation, and applicability judgments), rather than relying on internal discrimination alone.⁶² In parallel, regulators have converged on lifecycle controls for EHR-embedded deploymentnternal discrimination alonestme data feeds, and monitoring for drift—through the joint FDA/Health Canada/MHRA Good ML Practice principles and their Predetermined Change Control Plan guidance, which tie postdeployment monitoring and change management to safe updates of ML devices. These controls are directly responsive to well-documented risks such as data drift and distribution shift in clinical ML.⁶³ Nurse-centered human-factors work is equally necessary. User-centered design, threshold setting, and alert-governance are needed to mitigate alarm burden/fatigue repeatedly documented in ICU settings.⁶⁴ Also, commissioning frameworks such as NICE's Evidence Standards Framework specify implementation and evidence expectations (including postdeployment monitoring) that nurse-led teams can adopt as a practical checklist for deployment readiness.⁶⁵

Limitations

This review has several limitations. First, it only includes studies published in English, which may exclude relevant research published in other languages. Second, the majority of the studies reviewed were retrospective and observational in nature, which limits the ability to draw definitive causal conclusions or assess the real-world impact of AI interventions. Additionally, there is significant methodological heterogeneity across the included studies, with varying AI techniques, datasets, and outcome measures, making it difficult to directly compare results. Another limitation is the lack of prospective validation studies, which are critical for establishing the clinical utility and generalizability of AI models. Finally, while we focused on empirical research, the fast-paced nature of AI advancements means that newer studies may not have been captured, potentially overlooking recent developments in the field.

Conclusion

This review adds a nursing-centered synthesis rather than a model-centered catalog. We curated a contemporary corpus of 37 empirical studies in ICU nursing and mapped applications across three practice domains that match bedside work, namely early warning and risk prediction, nursing decision support, and workload and documentation support. We quantified method use across the corpus, with gradient boosting in 15 studies, random forest in 14, and logistic regression in 7, and we identified four DL implementations with task specific architectures. We characterized settings by population, showing five PICU studies and no NICU specific studies, and by design, showing a predominance of retrospective analyses with limited external validation and calibration. We also provide a lightweight quality appraisal suited to ICU ML studies and convert the synthesis into actionable guidance on model selection for tabular versus temporal or multimodal data, on data preprocessing requirements, and on steps for embedding tools into the EHR with nurse codesign. Together these elements clarify where evidence is mature, where gaps remain, and how nurse led teams can translate current tools into real world workflows.

Artificial intelligence applications in ICU nursing are moving from concept to practice and already show promise in early warning, complication prediction, decision support, and workload streamlining. The evidence base is still largely observational with short follow-up, which limits generalizability and confidence in sustained benefit. To turn promise into dependable practice, use tree-based ensembles as strong baselines for structured EHR data and add recurrent or convolutional models for temporal or multimodal signals only after transparent calibration and external validation. Preprocessing should state how missing data are handled, how class imbalance is addressed, and how feature stability is checked. Models should also provide nurse facing explanations that support prioritization without adding cognitive load. Implementation works best when tools are embedded in the EHR with auditable versioning, clear alert routing and suppression rules, and active monitoring for model drift, and when thresholds, screens, and workflows are co designed with bedside nurses.

The field now needs multicenter prospective studies and pragmatic nurse facing randomized trials using cluster or stepped wedge designs. These studies should report both process outcomes such as documentation time, alarm exposure, escalation timeliness, and adherence to protocols, and patient or operational outcomes such as mortality, delirium, ICU transfer accuracy, LOS, readmissions, and bed flow. Models should undergo geographic and temporal validation, report calibration and fairness across subgroups, preregister analysis plans, and include surveillance after deployment for safety events, performance drift, and workload impact. Important gaps remain in pediatrics, especially NICU settings, and in the availability of international datasets. Future work should also include economic evaluations to inform scale up. The foundation laid by these 37 studies can be strengthened through multidisciplinary collaboration and nurse leadership. With thoughtful integration, AI will not replace critical care nurses but will empower them to deliver smarter, more proactive, and patient-centered care.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251406302 - Supplemental material for Artificial intelligence applications in intensive care unit nursing: A narrative review (2020–2025)

Supplemental material, sj-docx-1-dhj-10.1177_20552076251406302 for Artificial intelligence applications in intensive care unit nursing: A narrative review (2020–2025) by Aiping Bi, Tie Li, Guohui Cheng and Jing Hu in DIGITAL HEALTH

Footnotes

ORCID iD

Aiping Bi

Contributorship

Aiping Bi: Conceived the study design, conducted the literature review, and wrote the manuscript. Tie Li: Assisted in data collection and analysis and contributed to manuscript revision. Guohui Cheng: Provided critical feedback on the methodology and interpretation of results. Jing Hu: Supervised the study, reviewed the manuscript, and coordinated the research.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

Data supporting the findings of this study are available within the article. All relevant data can be made available upon request to the corresponding author.

Guarantor

JH.

Supplemental material

Supplemental material for this article is available online.

References

Park

Chang

Kim

. Artificial intelligence in critical care nursing: a scoping review. Aust Crit Care 2025; 38: 101225.

Yee

. Clinical decision-making in the intensive care unit: a concept analysis. Intensive Crit Care Nurs 2023; 77: 103430.

Elhaddad

Hamam

. AI-driven clinical decision support systems: an ongoing pursuit of potential. Cureus 2024; 16: e57728.

Zhou

Geng

. Mapping artificial intelligence research trends in critical care nursing: a bibliometric analysis. J Multidiscip Healthc 2025: 182799–182811.

Maleki Varnosfaderani

Forouzanfar

. The role of AI in hospitals and clinics: Transforming healthcare in the 21st century. Bioengineering (Basel) 2024; 11: 337.

Park

Chang

Kim

. Artificial intelligence in critical care nursing: a scoping review. Aust Crit Care 2025; 38: 101225.

Porcellato

Lanera

Ocagli

, et al. Exploring applications of artificial intelligence in critical care nursing: A systematic review. Nurs Rep 2025; 15: 468–487.

Iivanainen

Ekstrom

Virtanen

, et al. Electronic patient-reported outcomes and machine learning in predicting immune-related adverse events of immune checkpoint inhibitor therapies. BMC Med Inform Decis Mak 2021; 21: 205.

Hassan

El-Ashry

. Leading with AI in critical care nursing: challenges, opportunities, and the human factor. BMC Nurs 2024; 23: 752.

10.

Porcellato

Lanera

Ocagli

, et al. Exploring applications of artificial intelligence in critical care nursing: a systematic review. Nurs Rep 2025; 15: 55.

11.

Levin

Suliman

Naimi

, et al. Augmenting intensive care unit nursing practice with generative AI: a formative study of diagnostic synergies using simulation-based clinical cases. J Clin Nurs 2025; 34: 2898–2907.

12.

Atalla

ADG

El-Gawad mousa

Hashish

EAA

, et al. Embracing artificial intelligence in nursing: exploring the relationship between artificial intelligence-related attitudes, creative self-efficacy, and clinical reasoning competency among nurses. BMC Nurs 2025; 24: 661.

13.

Pinsky

Bedoya

Bihorac

, et al. Use of artificial intelligence in critical care: opportunities and obstacles. Crit Care 2024; 28: 113.

14.

Hoelscher

. The good, the bad, and the binary: ethical impact of AI on nursing practice. Nursing 2025; 55: 26–32.

15.

Althobaiti

Almutairi

Muawwadh

, et al. Experiences, challenges, opportunities of nursing interns in Saudi Arabia: a systematic review and synthesis of qualitative studies. BMC Med Educ 2025; 25: 1201.

16.

Cheng

Joshi

Tandon

, et al. Using machine learning to predict ICU transfer in hospitalized COVID-19 patients. J Clin Med 2020; 9: 1668.

17.

Liu

Cheng

Chang

, et al. Evaluation of the need for intensive care in children with pneumonia: machine learning approach. JMIR Med Inform 2022; 10: e28934.

18.

Cho

Kim

Lee

, et al. Prospective, multicenter validation of the deep learning-based cardiac arrest risk management system for predicting in-hospital cardiac arrest or unplanned intensive care unit transfer in patients admitted to general wards. Crit Care 2023; 27: 346.

19.

Lei

Zhang

Yang

, et al. Machine learning-based prediction of delirium 24 h after pediatric intensive care unit admission in critically ill children: A prospective cohort study. Int J Nurs Stud 2023; 146: 104565.

20.

Hyland

Faltys

Huser

, et al. Early prediction of circulatory failure in the intensive care unit using machine learning. Nat Med 2020; 26: 364–373.

21.

Pan

Xiao

, et al. Prognostic assessment of COVID-19 in the intensive care unit by machine learning methods: model development and validation. J Med Internet Res 2020; 22: e23128.

22.

Hyun

Kaewprag

Cooper

, et al. Exploration of critical care data by using unsupervised machine learning. Comput Methods Programs Biomed 2020; 194: 105507.

23.

Dong

Feng

Thapa-Chhetry

, et al. Machine learning model for early prediction of acute kidney injury (AKI) in pediatric critical care. Crit Care 2021; 25: 288.

24.

Liang

Zhu

Tian

, et al. Early prediction of ventilator-associated pneumonia in critical care patients: a machine learning model. BMC Pulm Med 2022; 22: 250.

25.

Gong

Bergamaschi

, et al. Predicting intensive care delirium with machine learning: model development and external validation. Anesthesiology 2023; 138: 299–311.

26.

Zhang

Chen

Lai

, et al. Machine learning for the prediction of acute kidney injury in critical care patients with acute cerebrovascular disease. Ren Fail 2022; 44: 43–53.

27.

Guan

Chang

, et al. Interpretable machine learning models for predicting venous thromboembolism in the intensive care unit: an analysis based on data from 207 centers. Crit Care 2023; 27: 406.

28.

Fabregat

Magret

Ferre

, et al. A machine learning decision-making tool for extubation in intensive care unit patients. Comput Methods Programs Biomed 2021; 200: 105869.

29.

Zeng

Tang

Liu

, et al. Interpretable recurrent neural network models for dynamic prediction of the extubation failure risk in patients with invasive mechanical ventilation in the intensive care unit. BioData Min 2022; 15: 21.

30.

Park

Han

Jang

, et al. Development and validation of a machine learning model for early prediction of delirium in intensive care units using continuous physiological data: retrospective study. J Med Internet Res 2025; 27: e59520.

31.

Zappalà

Alfieri

Ancona

, et al. Development and external validation of a machine learning model for the prediction of persistent acute kidney injury stage 3 in multi-centric, multi-national intensive care cohorts. Crit Care 2024; 28: 189.

32.

Kim

, et al. Machine learning algorithms predict successful weaning from mechanical ventilation before intubation: Retrospective analysis from the medical information mart for intensive care IV database. JMIR Form Res 2023; 7: e46896.

33.

Lim

Kim

Cho

, et al. Multicenter validation of a machine learning model to predict intensive care unit readmission within 48 h after discharge. EClinicalMedicine 2025; 81: 103112.

34.

Alghatani

Ammar

Rezgui

, et al. Predicting intensive care unit length of stay and mortality using patient vital signs: machine learning model development and validation. JMIR Med Inform 2021; 9: e21347.

35.

Fan

Yang

Liu

, et al. Machine learning-based prediction of prolonged intensive care unit stay for critical patients with spinal cord injury. Spine (Phila Pa 1976) 2022; 47: E390–E398.

36.

Chen

Deng

, et al. Development and validation of a machine learning algorithm-based risk prediction model of pressure injury in the intensive care unit. Int Wound J 2022; 19: 1637–1649.

37.

Pandey

Jahanabadi

D'arcy

, et al. Early prediction of intensive care unit admission in emergency department patients using machine learning. Aust Crit Care 2025; 38: 101143.

38.

Zakariaee

Naderi

Kazemi-Arpanahi

. Development of machine learning prediction models to predict ICU admission and the length of stay in ICU for COVID-19 patients using a clinical dataset including chest computed tomography severity score data. Gazi Med J 2025; 36: 278–286.

39.

Tang

. Interpretable machine learning model for early prediction of delirium in elderly patients following intensive care unit admission: a derivation and validation study. Front Med (Lausanne) 2024; 11: 1399848.

40.

Chen

Hou

Song

. Development and multi-database validation of interpretable machine learning models for predicting In-Hospital mortality in pneumonia patients: A comprehensive analysis across four healthcare systems. Respir Res 2025; 26: 279.

41.

Zheng

Xue

Y-J

Yuan

Z-N

, et al. Explainable SHAP-XGBoost models for pressure injuries among patients requiring with mechanical ventilation in intensive care unit. Sci Rep 2025; 15: 9878.

42.

Lim

Gim

Cho

, et al. Real-time machine learning model to predict short-term mortality in critically ill patients: development and international validation. Crit Care 2024; 28: 76.

43.

Mesinovic

Watkinson

Zhu

. Explainable machine learning for predicting ICU mortality in myocardial infarction patients using pseudo-dynamic data. Sci Rep 2025; 15: 27887.

44.

Van Slobbe

Herrmannova

Boeke

, et al. Multimodal convolutional neural networks for the prediction of acute kidney injury in the intensive care. Int J Med Inf 2025; 196: 105815.

45.

Digitale

Franzon

, et al. Expert-augmented machine learning for predicting extubation readiness in the pediatric intensive care unit. BMC Med Inform Decis Mak 2025; 25: 232.

46.

Fenske

Peltekian

Kang

, et al. Developing and validating machine learning models to predict next-day extubation. Sci Rep 2025; 15: 27552.

47.

Zappalà

Scaravilli

Rovati

, et al. Development and validation of a machine learning model for real-time prediction of invasive mechanical ventilation weaning readiness. J Crit Care 2025; 89: 155105.

48.

Cesare

Nurchis

Nursing

and Public Health Group

, et al. Ranking nursing diagnoses by predictive relevance for intensive care unit transfer risk in adult and pediatric patients: A machine learning approach with random forest. Healthcare (Basel) 2025; 13: 1339.

49.

Sitthiprawiat

Wittayachamnankul

Sirikul

, et al. Development and internal validation of an AI-based emergency triage model for predicting critical outcomes in emergency department. Sci Rep 2025; 15: 31212.

50.

Fan

Sun

Ashrafi

, et al. Aligning language models with clinical expertise: DPO for heart failure nursing documentation in critical care. arXiv preprint 2025; arXiv: 2510.05410.

51.

Palmer

Manataki

Moss

, et al. Feasibility of forecasting future critical care bed availability using bed management data. BMJ Health Care Inform 2024; 31: e101096.

52.

Schiele

Koperna

Brunner

. Predicting intensive care unit bed occupancy for integrated operating room scheduling via neural networks. Nav Res Logist 2020; 68: 65–88.

53.

National paediatric early warning system (PEWS) observation and escalation charts. London, UK: NHS England, 2023.

54.

Clinical Practice Guidelines: Acceptable ranges for physiological variables (children). Melbourne, Victoria, Australia: Royal Children's Hospital Melbourne, 2020.

55.

Bracken

Lane

Siner

, et al. Assessing the performance of paediatric early warning scores to predict critical deterioration events in hospitalised children (the DETECT study): a retrospective matched case–control study. BMC Pediatr 2025; 25: 520.

56.

Lei

Zhang

Yang

, et al. Machine learning-based prediction of delirium 24 h after pediatric intensive care unit admission in critically ill children: a prospective cohort study. Int J Nurs Stud 2023; 146: 104565.

57.

Balit

Larosa

Ong

JSM

, et al.

Sedation protocols in the pediatric intensive care unit: fact or fiction?

Transl Pediatr 2021; 10: 2814–2824.

58.

Liu

Gao

Liu

, et al. Development and validation of a practical machine-learning triage algorithm for the detection of patients in need of critical care in the emergency department. Sci Rep 2021; 11: 24044.

59.

Zhang

Cui

Ding

, et al. A cluster-randomized controlled trial of a nurse-led artificial intelligence assisted prevention and management for delirium (AI-AntiDelirium) on delirium in intensive care unit: study protocol. PLoS One 2024; 19: e0298793.

60.

Labrague

Al Sabei

Al Yahyaei

. Artificial intelligence in nursing education: a review of AI-based teaching pedagogies. Teach Learn Nurs 2025; 20: 210–221.

61.

Babalola

Gaston

Trombetta

, et al. A systematic review of collaborative robots for nurses: where are we now, and where is the evidence?. Front Robot AI 2024; 11: 1398140.

62.

Collins

Moons

KGM

Dhiman

, et al. TRIPOD + AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. Br Med J 2024; 385: e078378.

63.

Sahiner

Chen

Samala

, et al. Data drift in medical machine learning: implications and potential remedies. Br J Radiol 2023; 96: 20220878.

64.

Michels

EAM

Gilbert

Koval

, et al. Alarm fatigue in healthcare: a scoping review of definitions, influencing factors, and mitigation strategies. BMC Nurs 2025; 24: 664.

65.

National Institute For

Care

. Evidence standards framework for digital health technologies. London: National Institute for Health and Care Excellence, 2023.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB