Abstract
Objective
Digital twins (DTs) show promise in critical care by enabling personalised treatment and optimising clinical decision-making. Despite the complexity and data-intensive nature of critical care, the implementation of DTs in this setting remains under-investigated. This scoping review aimed to summarise DT research in critical care and identify current evidence gaps.
Methods
Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines, seven electronic databases were searched. Studies reporting the development or evaluation of DT models in adult critical care were included. Data were extracted on study characteristics and DT development features, including modelling approaches, levels of data integration, and key findings.
Results
Twenty-three studies were included, with most originating from North America and Europe. Retrospective designs using hospital datasets derived from intensive care unit and emergency department settings were common. Data integration predominantly corresponded to the digital model level of the DT maturity, whereas fully automated DT implementations were rare. Regarding modelling approaches, mathematical models were most frequently developed, followed by machine learning-based predictive models. DT application primarily focused on predictive modelling and virtual patient simulations to enhance personalised treatment, support clinical decision-making, and optimise organisational resource allocation.
Conclusion
DT technologies in critical care remain in the exploratory and early stages of development and implementation. Further research incorporating higher levels of data integration, real-time deployment, and longitudinal external validation is warranted, alongside broader consensus on ethical governance and data privacy.
Keywords
Introduction
Digital twins (DTs) are dynamic virtual representations of physical systems updated using real-time data. 1 DTs originated from simulation models developed by the National Aeronautics and Space Administration in the 1970s to support the prevention and resolution of Apollo mission accidents2,3 and have since evolved across academic domains.4,5 Since the 2000s, DTs have gained considerable attention in the context of the Fourth Industrial Revolution, particularly for managing product lifecycles and creating virtual counterparts for physical systems for real-time synchronisation, simulation, and predictive modelling. 6 Information and communication technology advances, along with the increasing adoption of precision medicine, have expanded DT applications in the healthcare industry, enabling the exploration of their potential across diverse clinical contexts. 6
In clinical settings, DT technology has been implemented at multiple levels, ranging from organisational systems to patient-specific organs. 7 Additionally, at the organisational-level, AI-supported DT systems have been deployed in the radiology department to address the challenges posed by clinical complexity, ageing infrastructure, workflow delays, and growing patient demand. 8 At the patient level, DT interventions for type 2 diabetes create patient-specific virtual replicas by integrating continuous glucose monitoring, nutrition, activity, and sleep data, enabling the prediction of postprandial glucose responses and tailored recommendations, leading to improvements in haemoglobin A1c levels, medication use, and overall metabolic outcomes. 9 These examples demonstrate the broad utility of DTs across hospital management and personalised medicine. 10
Although DT technology has shown promise in various healthcare settings, its application in critical care remains in its infancy, with limited empirical research and clinical implementation owing to significant barriers in data integration, regulatory governance, and scalability. 7 While critical care is often synonymous with the intensive care unit (ICU), this review explicitly encompasses high-acuity environments such as emergency departments (EDs) and step-down units, acknowledging that the continuum of critical illness frequently begins with emergency stabilisation. 11 This broader scope is essential to capture DT applications across the full trajectory of acute and critical illness. 12 Critical care environments are characterised by continuous data streams, rapid patient status fluctuations, and an ongoing influx of multivariate clinical information. 13 Notably, clinicians must interpret these signals, make high-stakes decisions in real time, and deliver life-sustaining therapies to critically ill patients, thereby increasing the need for advanced informatics solutions.13,14 Further, information overload and patient heterogeneity limit the effectiveness of standardised treatment protocols, which may not adequately capture individual physiological responses. These limitations highlight the need for precise medical approaches tailored to critical illness.15,16 Importantly, DTs can address this gap by providing a clearer physiological representation of patient states, enabling more consistent and effective care. 17
DTs integrate diverse data sources, including bedside monitoring, imaging, pharmacokinetic/pharmacodynamic models, and electronic health records (EHRs), to construct and continuously refine patient-specific virtual models.18,19 These models can facilitate dynamic risk assessments, treatment simulations, and individualised care pathways. By synthesising multimodal patient data, DTs offer a framework for simulating disease progression, predicting treatment responses, and conducting
Despite growing interest, inconsistent use of the term DT in the literature risks conflating it with simpler digital representations, such as digital models or shadows. 23 To address this conceptual ambiguity, DT-related technologies can be categorised operationally into three levels based on the direction and automation of data exchange between physical and digital entities: Digital Model (DM), Digital Shadow (DS), and Digital Twin (DT). 24 A DM is a static digital representation with no automated data exchange between physical and digital entities. A DS enables unidirectional data flow from the physical system to the digital representation, allowing real-time updates without feedback to the physical entity. In contrast, a DT is characterised by automated bidirectional data exchange, where the physical and digital entities dynamically interact, enabling real-time simulation and decision support. 24 Given that many studies described as DT may actually correspond to DM or DS, this classification was used to accurately assess the level of technical implementation in critical care research.
Recently, several reviews have examined DT use in healthcare, including a systematic review of precision health outcomes regarding DT technology,
25
a broad scoping review of DT applications across health domains,
6
and a narrative review outlining the potential roles of DT in critical and acute care medicine.
26
Nonetheless, evidence specific to critical care remains limited, as the only review specific to this area remains conceptual insights. Preliminary studies suggest that DTs can support the early detection of patient deterioration, guide therapy optimisation, and enable the
To date, no systematic or scoping review has provided a comprehensive overview of the development and implementation of DT specifically in critical care settings. Given the unique complexity of these environments, a focused synthesis of DT-related research is warranted. Therefore, this scoping review aimed to systematically summarise published studies on DTs across all development and clinical application stages in critical care. This review comprehensively charts the landscape of DT applications in adult critical care by examining study characteristics, aims, target populations, modelling approaches, and the level of data integration-classified as DM, DS, or DT. Through this analysis, we evaluated the extent of technological development and clinical implementation, provided an overview of the current state of research and identified existing gaps. By integrating evidence from proof-of-concept models and early clinical evaluations, this review provides insights to guide future research and advance the clinical translation of DT technologies in critical care.
Methods
Design
This scoping review was conducted in accordance with the Joanna Briggs Institute (JBI) methodological guidance for scoping reviews and the procedures described in Chapter 10 of the JBI Manual for Evidence Synthesis.27,28 Additionally, it was reported in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension of the Scoping Reviews) checklist. 29 The PRISMA-ScR checklist for this review is presented in Supplementary File 1 (Table S1). In accordance with the scoping review framework, a formal risk-of-bias assessment was not performed, as the objective was to summarise methodological characteristics rather than to evaluate causal effect estimates.
The population, concept, and context (PCC) framework for this scoping review is as follows: (1) population: adult critically ill patients, (2) concept: digital twin (DT) technologies, and (3) context: critical care settings. The scoping review protocol was registered in the Open Science Framework Registry (registration number: https://doi.org/10.17605/OSF.IO/62HJD).
Search strategy (Information sources)
A comprehensive search strategy was employed, encompassing seven electronic databases (i.e., PubMed, IEEE Xplore, CINAHL, Cochrane Library, Embase, Scopus, and Web of Science). These databases were systematically searched in accordance with the PCC framework. Database-specific keywords and indexed terms were used without date restrictions, applying the “All fields” option where appropriate (Supplementary File 2, Table S2). Considering the heterogeneity in related terminology (e.g., virtual patient, physiologic simulator, digital avatar, digital model), the search was intentionally restricted to the explicit “digital twin*” to maintain conceptual specificity and ensure the retrieval of studies aligned with established DT criteria. The search strategy was reviewed and confirmed by an experienced health sciences librarian. The primary search was conducted on 20 January 2025 and subsequently updated on 17 July 2025.
An example of the full PubMed search string was as follows: “digital twin*” [All Fields] AND (“Critical care” [MeSH Terms] OR “Critical Illness” [MeSH Terms] OR “Intensive Care Units” [MeSH Terms] OR “Critical Care Nursing” [MeSH Terms] OR “Critical care” [All Fields] OR “critical illness” [All Fields] OR ICU [All Fields] OR “intensive care” [All Fields] OR “intensive care unit*” [All Fields]”
Eligibility criteria
Using the PCC framework, the following inclusion criteria were established: (1) population—studies that utilised patient data collected prospectively or retrospectively in real-world clinical settings or that included virtual patients simulated or generated from clinical data; (2) concept—research applying DT technologies or describing the development of a DT model for healthcare, provided that the system created an explicit virtual representation of a critical care entity (e.g., patient, organ system, workflow, or facility) and enabled dynamic modelling of state transitions in response to simulated or real-world clinical interventions; and (3) context—studies conducted in critical care settings such as ICUs and high-acuity care settings (e.g. ED).
To capture emerging developments in this rapidly evolving field, both peer-reviewed journal articles and relevant grey literature (e.g., preprints and conference papers) were considered eligible, provided they presented original research with sufficient methodological detail to enable appraisal and data extraction.
The exclusion criteria were as follows: (1) population—studies that included infant or paediatric patients; (2) concept—research that solely describes the theoretical frameworks of DT technologies, or that are limited to static prediction models, risk scoring tools, or data visualisation dashboards without an explicit virtual representation and state-transition mechanism; (3) context—studies not conducted in critical care settings (e.g. general wards, outpatient clinics, rehabilitation facilities, long-term care), including those in nonhospital or community-based contexts; (4) language—research not written in English; and (5) study design—review articles, editorials, commentaries, letters, book chapters, dissertations, and other non-original research articles.
Study selection
All the studies identified in the comprehensive search were exported to EndNote 21 (Clarivate Analytics) for reference. After removing duplicates using EndNote, the remaining studies were screened using Google Sheets (Google LLC, Mountain View, CA, USA). The studies were independently screened by two researchers (Yeonw K and JK) according to the eligibility criteria. Any disagreements were resolved through discussion with a third researcher (Yeonj K or MC). Studies for which consensus was achieved were included in the final selection.
Data extraction and analysis
Two researchers (Yeonw K and JK) independently extracted data from the selected studies. The process was facilitated by Elicit, an AI-based literature review assistant, 30 previously applied in a scoping review. 31 Elicit was used to identify the pre-defined data fields (e.g., study design, study objectives, datasets, participant counts, and outcomes). All AI-generated outputs were treated as provisional and were not directly incorporated into the review. Each extracted data element was manually cross-validated against the original full-text articles by two independent reviewers. Discrepancies were resolved through manual correction and discussion with a third researcher (Yeonj K). No AI-generated data were retained without independent human verification.
Data extraction forms were constructed for each included study and comprised three parts: (1) overall study characteristics, including year, continent, study design, study setting, data sources, and level of data integration of DTs; (2) study aims and clinical context, including dataset characteristics, critical care setting, clinical conditions, sample size, and intended end users; and (3) DT development and evaluation features, including modelling approaches, level of data integration of DTs, clinical input data, validation strategy, evaluation metrics, and key findings.
Characteristics of the included studies (n=18).
a“Not applicable” under Study Setting refers to studies conducted exclusively in simulation environments, rule-based modelling frameworks, or virtual platforms without the use of real-world clinical data.
*“Not reported” under Data Sources indicates that the data sources used for model development were not explicitly described, typically because the studies evaluated pre-existing clinical systems.
Summary of study aims and clinical context of included studies (n=18).
aPreprint.
AHRF: acute hypoxaemic respiratory failure; AI: artificial intelligence; ARDS: acute respiratory distress syndrome; CIDT: Critical Illness Digital Twin; DES: discrete-event simulation; DSS: decision support system; DT: digital twin; EHR: electronic health record; eICU-CRD: eICU Collaborative Research Database; ER: emergency department; HFNC: high flow nasal cannula; ICU: intensive care unit; ITU: intensive therapy unit; LVAD: left ventricular assist device; MIMIC: Medical Information Mart for Intensive Care; ML: machine learning; MV: mechanical ventilation; PROMMTT: Prospective, Observational, Multicenter, Major Trauma Transfusion; PSCOPE: Physiology Simulation Coupled Experiment; RCT: randomised controlled trial; Tele-ICU: tele-intensive care unit.
Synthesis of digital twin modelling, evaluation features, and key findings (n=18).
aPreprint.
ABMS: agent-based modelling and simulation; ABS: agent-based simulation; AHRF: acute hypoxaemic respiratory failure; AI: artificial intelligence; AUROC: area under the receiver operating characteristic curve; β-VAE: beta variational autoencoder; CI: confidence interval; CV: cross validation; DAG: directed acyclic graph; DES: discrete-event simulation; DM: digital model; DS: digital shadow; DSS: Decision Support System; DT: digital twin; EAdi: electrical activity of the diaphragm; GCS: Glasgow Coma Scale; HCT: healthcare technician/assistant; HFNC: high flow nasal cannula; HR: heart rate; ICU: intensive care unit; IQR: interquartile range; LSTM: long short-term memory; MAPE: mean absolute percentage error; MLP: multi-layer perceptron; MV: mechanical ventilation; NASA-TLX: NASA Task Load Index; NAVA: neurally adjusted ventilatory assist; NIV: non-invasive ventilation; NR: not reported; ODE: Ordinary Differential Equations; PEEP: positive end-expiratory pressure; PSCOPE: Physiology Simulation Coupled Experiment; PSV: pressure support ventilation; PVR: pulmonary vascular resistance; RCT: randomised controlled trial; RNN: recurrent neural network; SOFA: Sequential Organ Failure Assessment; SpO2: peripheral oxygen saturation; SUS: System Usability Scale; SVR: systemic vascular resistance; V/S: vital signs.
Results
Study selection
A systematic search conducted on 20 January 2025 and updated on 17 July 2025 identified 1,631 records. Following the removal of 163 duplicates, 1,468 titles and abstracts were screened, leading to exclusion of 1,344 records. Of the 116 full-text articles assessed for eligibility, 93 were excluded, resulting in 23 studies included in the final synthesis. Of these, 18 were classified as primary studies (16 peer-reviewed journal articles and two preprints), while five conference papers were presented separately in Supplementary File 3 (Table S3). The full selection process is illustrated in the PRISMA flow diagram (Figure 1). Preferred reporting for systematic reviews and meta-analysis (PRISMA) flow diagram.
Study characteristics
Table 1 summarises the overall characteristics of the included peer-reviewed articles and preprints. The publication years ranged from 2020 to 2025, with a significant majority of the studies published after 2024 (11/18, 61.1%).12,38–47 Geographically, half of the studies were conducted in North America, specifically the United States (9/18, 50%).12,18,32,33,35,40,42,44,45
Regarding study design, most were retrospective (11/18, 61.1%),12,18,33,34,36,37,39,41,43,45,46 although two involved secondary analyses of prospectively collected cohort data.41,46 Half of the included studies were conducted in single-centre setting (9/18, 50%).18,32,33,35,39,41,45–47 “Not applicable” was assigned to studies based solely on simulation environments, rule-based modelling frameworks, or virtual platforms without real-world clinical data sources.40,42,44
Most studies used hospital datasets (11/18, 61.1%), including single-centre hospital cohorts or multicentre clinical trial datasets,12,18,32,34,35,37,41,43,45–47 whereas publicly available datasets such as the Medical Information Mart for Intensive Care (MIMIC) were used in three studies33,36,39 (3/18, 16.7%). A small number of studies relied on expert rule-based42,44 or simulation-generated data, 40 and one randomised controlled trial did not explicitly report its data source. 38
In terms of the level of data integration of DTs, the majority were classified as DM (15/18, 83.3%),6,31–35,37,39–44,47 whereas one study was categorised as a DS 38 and two as full DT.40,47
Study aims and clinical context
Table 2 presents the study aims and the clinical context, including dataset, critical care setting, clinical conditions, sample sizes, and intended end user. ICUs were the most common environments examined settings, accounting for 83.3% (15/18) of the included studies.18,32–34,36–43,46,47 One study was conducted in an ED. 12 Two did not report the type of care unit. One of these involved hospitalised patients with trauma-related acute respiratory distress syndrome, 35 whereas the other implemented simulation-based scenarios representing critically ill patients and was therefore categorised as “Not reported”. 44
Regarding clinical conditions, eight studies (44.4%) focused on respiratory conditions, including patients requiring mechanical ventilation or high-flow nasal cannula therapy.34–38,41,43,46 Two studies each focused on sepsis32,39 and haemodynamic instability.12,47 One study examined ischaemic stroke progression 33 and another focused on patients with heart failure who underwent left ventricular assist device (LVAD) implantation. 45 The remaining studies (4/18, 22.2%) addressed mixed critical illness scenarios without focusing on specific patient conditions. Of these, two studies focused on simulating ICU environments for critical care delivery optimisation or nurse-robot collaboration,18,40 and two were designed for simulation-based education using rule-based models.42,44
Sample sizes were extracted as reported by author and reflected the cohorts used for model development or clinical evaluation, depending on study design. Four studies included more than 1,000 samples, all of which were retrospective in design and utilised large-scale datasets, such as the publicly available MIMIC-IV and eICU Collaborative Research Database or institutional electronic health record databases (e.g., Mayo Clinic).18,33,36,39 Most studies included fewer than 50 samples (11/18, 61.1%),32,34,37,40–47 with two prospective studies including 29 and 31 patients.32,47
Regarding intended end users, most studies targeted physicians, either explicitly or as part of broader clinician groups (13/18, 72.2%).12,32–34,36–39,41,43–46 A smaller number of studies identified other professional users, including internal medicine residents in ICUs, 42 and multidisciplinary ICU clinicians (physicians and nurses). 47 One study demonstrated active nursing involvement in DT use. 38 Beyond direct clinical users, several studies were designed for non-bedside stakeholders, such as ICU operational decision-makers, healthcare managers, and ML/AI model developers.18,35,40
Modelling approaches and key findings
Table 3 synthesises the DT modelling approaches, levels of data integration, clinical input data, validation strategies, evaluation metrics, and key findings of the included studies (n = 18). The modelling approaches were heterogeneous, encompassing mathematical models, ML-based models, simulation-based frameworks, and rule-based systems. Mathematical models were the most common approach, representing physiological dynamics (8/18, 44.4%),12,34,37,38,41,43,45,46 followed by ML approaches (4/18, 22.2%) primarily used for predictive modelling and risk stratification.33,36,39,47 Simulation-based approaches mostly represented care delivery processes (3/18, 16.7%),18,35,40 and expert rule-based approaches were typically applied in educational or scenario-based simulations (2/18, 11.1%).42,44 One study employed mixed methods: a hybrid causal AI model that integrates expert rule-based Bayesian networks with two simulation-based models. 32
Most studies utilised EHR-derived clinical inputs, such as demographic information, laboratory results, and vital signs. A few articles incorporated clinical severity scores.32,36 Research targeting patients requiring mechanical ventilation or high-flow nasal cannula therapy used respiratory device parameters, such as ventilator and flow-rate data, as model inputs.9,34,36,37,41,43,46 Two studies modelled the entire ICU environment using hospitalisation records, clinical task types and durations, and information on capacity and resource utilisation.18,40
No studies reported external validation using independent datasets. A small number conducted internal validation procedures such as splitting the datasets into training and validation subsets.33,36,39,47 Under these internal validation settings, classification models reported high Area Under the Receiver Operating Characteristic Curve (AUROC) values, and while regression-based models demonstrated low prediction errors, including mean absolute percentage error (MAPE) values approximating 5%.
However, several studies conducted clinical or system-level validation, comparing model outputs with real patient data to assess clinical plausibility and performance. In these cases, validation was typically involved by comparing simulated outputs against observed clinical parameters. To evaluate model performance, various quantitative metrics were reported depending on the modelling objective. Models predicting continuous clinical or operational variables presented prediction accuracy using measures such as MAPE,41,46 coefficient of determination (R2),34,43,49 or confidence intervals to demonstrate no statistically significant differences between simulated and observed values. 18 Agreement-based metrics such as kappa coefficients were also used in one study. 32
Despite these validation efforts and comparisons with real-world data, most systems did not demonstrate automated real-time data exchange or bidirectional integration with clinical infrastructures and were therefore classified as DM.
Among the studies that achieved DS or DT levels of integration, implementation depth varied substantially. Patel et al. (2024) 38 was classified as a DS despite conducting a prospective RCT-based clinical validation, as the decision support system operated in a unidirectional manner within a human-in-the-loop framework. In contrast, studies classified as DT, including those by Anyene et al. (2024) 40 and Nair et al. (2025), 47 demonstrated bidirectional or closed-loop architectures. However, these studies were conducted within simulation environments rather than fully integrated into routine real-time clinical workflows.
Discussion
Principal findings
This scoping review synthesised the current landscape of DT applications in adult critical care by examining study characteristics, clinical context, and DT modelling and evaluation features. Across the 18 included studies, the findings suggest that DT research in critical care remains at an early developmental stage, often characterised by small sample sizes, single-centre designs, and an absence of external validation.
Because DTs represent a relatively novel concept, no studies reporting the development and evaluation of functional DT models in critical care were identified before 2020. Since then, an upward trend has been observed, with more than 60% of studies published after 2024, as reported in this review. This surge likely reflects a growing interdisciplinary interest in digital health innovation and biomedical simulation research. This momentum has been further accelerated by the coronavirus disease 2019 (COVID-19) pandemic, during which DTs were applied to simulate disease spread, optimise healthcare operations, evaluate treatment effects, and support drug development. 50 Although most studies originated from high-income regions, particularly the United States and Europe, the continued expansion of DT research across diverse healthcare contexts will be important to ensure broader applicability and equitable adoption.
In this expanding field of research, a substantial proportion of the included studies developed DT frameworks that utilised patient physiological data to simulate lung mechanics, predict treatment outcomes, and explore patient-specific strategies. Notably, mechanical ventilation is particularly well-suited for quantitative physiological modelling owing to continuous data collection and real-time monitoring. Recently, efforts have been made to integrate deep learning technologies to enhance decision support and implement automated decision-making in complex ventilator management. 36 This respiratory focus contrasts with previous DT reviews, where most studies addressed specific organs, such as the heart, bones, and joints, or broader biological systems, such as the endocrine and immune systems. 51 While research on non-respiratory conditions, such as sepsis, stroke, and cardiovascular dysfunction, remains limited, future DT studies should increasingly encompass these groups to better reflect critical care diversity. The acquisition of high-quality, real-time clinical data is therefore important, as data sources and technological tools shape the methodological approaches used in DT development.
Physicians were the primary end users, predominantly in a clinical decision support context. Although nursing staff were involved in the implementation and training process in one prospective study, the system itself remained largely physician-oriented. 38 Another study examining infusion rate optimisation demonstrated the potential relevance of DT tools for nursing-related medication management tasks, suggesting that certain applications may extend beyond physician-centred use. 47 At the organisational-level, DT applications demonstrated potential for workflow optimisation, staffing allocation, and resource management, as illustrated by simulation-based studies such as nurse–robot collaboration. 40 To be effectively integrated into practice, DT interfaces and workflows must be tailored to the distinct cognitive demands and professional accountabilities of each user group. Bedside physicians require real-time synchronisation and actionable physiological insights for high-stakes decisions, whereas organisational decision-makers prioritise aggregated data simulations for resource management. For trainees, DTs function as high-fidelity, risk-free environments that enable the exploration of ‘what-if’ scenarios, bridging theoretical knowledge and practical clinical competence. 6 Given the early stage of DT development in healthcare, further research is needed to clarify how role-specific workflow integration and accountability structures should be operationalised across multidisciplinary ICU teams, spanning three broad functional domains: predictive diagnostics, patient-specific treatment simulation, and organisational-level optimisation.
Several DTs focused on predictive tasks, such as forecasting physiological deterioration or estimating response to therapy using retrospective data. Others implemented patient-specific simulation frameworks to model ventilation strategies, haemodynamic management, or glycaemic control, primarily within in-silico or controlled prospective settings. A smaller subset of studies extended beyond bedside decision support toward hospital-level optimisation, including workflow modelling and resource allocation simulations. This functional categorisation indicates that, although most current implementations remain model-centric and limited in real-world validation, DT research in critical care spans both individual patient management and broader organisational decision-making contexts.
In this scoping review, DT studies were categorised into mathematical (mechanistic), data-driven ML, simulation-based, and expert rule-based models. Mechanistic models offer physiological interpretability and precise representation of patient-specific states but are limited in capturing highly dynamic, non-linear systems such as acute haemorrhage. 12 In contrast, ML approaches accommodate complex pattern recognition and adaptive modelling, yet often lack transparency and depend on large-scale data. Although explainability techniques (e.g., SHAP) have been applied in one study, 39 limited interpretability remains a barrier to clinical trust in high-stakes environments. For advanced DT applications, real-time adaptive control via online sequential learning with multi-objective model selection has been shown to reduce control error in streaming environments, 52 aligning with the core DT requirement of continuous synchronisation.
The persistent challenge of “data sparsity” in healthcare must be addressed. Similar to strategies in the engineering domain—where scarce fault data are supplemented by finite element simulations 53 —mechanistic simulation models can generate high-fidelity synthetic patient trajectories, providing robust in-silico datasets to mitigate data sparsity and high-dimensional complexity in ML model training. 35 Together, these trade-offs indicate that integrating mechanistic insight with data-driven adaptability may represent a more viable pathway for advancing DT implementation.
Despite these technical advances, the translation of DT models into clinically reliable systems remains constrained by fundamental study design limitations. Most DT studies in critical care have been conducted in single-centre settings with small sample sizes, reflecting the time and resource-intensive nature of developing, debugging, and validating these models.32,34,41,43,47 Although several studies have used large-scale public datasets or institutional cohorts to enhance model robustness,18,33,36,39 the limited variable diversity of these datasets hinders their ability to capture the complexity of real-world critical care. The occurrence of ‘not reported’ or ‘not applicable’ items regarding data provenance and study settings raises concerns about reproducibility and external validity, as unclear description limits the ability to replicate findings and assess their generalisability. This highlights the need for greater methodological transparency and consistency in reporting practices in DT research.
Across the included studies, validation was predominantly limited to internal approaches such as split-sample or cross-validation, while no study reported independent external validation using separate datasets. In several cases, simulation outputs were compared with retrospective clinical data to assess internal consistency and calibration; however, such strategies do not establish real-world reliability. Notably, model performance declined when applied to external hospitals, 54 underscoring the need for multi-institutional collaboration, and prospective validation under routine clinical conditions to enhance generalisability.
To characterise implementation maturity studies were positioned along a continuum of data integration and clinical embedding. At the lowest level, several studies functioned as DM, evaluated solely in simulation environments without direct patient data integration. Others incorporated retrospective EHR data, representing partial clinical linkage. One emerging preprint described a hardware-integrated simulation platform linking device-level data from an LVAD to a model; although classified as DM in this review, its architecture borders on a DS. 45 Only one study progressed to prospective human-in-the-loop evaluation, testing a physiological model–based decision support system in a multicentre RCT context. 38 Closed-loop bidirectional DT mechanisms were largely confined to simulated settings, and no study reported fully automated, real-time DT deployment integrated into routine critical care practice.
Emerging work presented in conference papers48,49,55–57 (see Supplementary File 3, Table S3) suggests ongoing movement toward greater integration and autonomy. Recent simulation-based studies have implemented stochastic modelling of mechanical ventilation protocols within closed-loop in-silico environment,48,49 while a prospective study reported clinical validation of a stochastic glycaemic control protocol in ICU patients, demonstrating safe and effective glucose management under controlled clinical conditions. 55 Although these findings indicate methodological progression, most developments remain confined to simulation-based or controlled prospective evaluation rather than routine automated deployment in clinical practice.
Predictive DT models generally reported high performance; however, heterogeneous reporting and inconsistent calibration metrics limited cross-study comparison. Many relied on surrogate or simulation-based outcomes rather than prospective clinical endpoints, indicating that predictive accuracy alone does not establish clinical readiness. Notably, one prospective study demonstrated both physiological improvements and clinician adherence of approximately 60%, 38 highlighting that usability and workflow integration are as critical as predictive performance for real-world implementation.
From an implementation perspective, most DT systems operated in batch or retrospective modes rather than true real-time processing, with limited reporting of run-time performance, computational scalability, or interoperability with existing ICU monitors and EHR systems. Only a small number provided any indication of processing time,47,58 which is worth noting given the time-critical nature of ICU decision-making. Human-in-the-loop configurations were more common than fully automated closed-loop systems, reflecting technical, infrastructural, and governance-related constraints that limit routine clinical integration.
Despite current limitations, DT technology holds translational potential to enhance safer and more precise clinical care.6,7 In critical care, life-support management and weaning follow established protocols. 59 Integrating patient-specific, real-time physiological and device-derived data into simulation-based decision support could augment these protocols during weaning. Progression from controlled validation to embedded workflow integration and prospective real-world evaluation will be critical for routine DT deployment.
Challenges and future directions
These findings indicate that integrating DTs into ICU settings remains a challenging endeavour, requiring concerted efforts to address persistent challenges in data quality, interoperability, and ethical governance. First, the reliance on small sample sizes and single-centre datasets, which constrain model performance and generalisability, was a commonly identified limitation across the included studies. Second, the lack of standardised data structures and interoperability across health systems remains a fundamental barrier to real-time data integration and scalable DT development. Third, ethical concerns surrounding patient privacy, informed consent, and data ownership necessitate robust governance frameworks and adherence to regulatory standards.
Despite these challenges, ICU-specific DTs may offer a structured pathway toward clinical integration. A staged roadmap may support ICU-specific DT integration. For example, development may progress from conceptual modelling of high-frequency physiological signals and pharmacodynamic responses (Stage 1), to virtual patient simulation and counterfactual analysis (Stage 2), bedside shadow-mode prediction (Stage 3), clinician-in-the-loop decision support (Stage 4), and ultimately tightly bounded automation under predefined safety constraints (Stage 5).
Across these stages, ICU-specific challenges such as handling non-stationary high-frequency data, ensuring safe clinician oversight of automation, and maintaining real-time reliability must be addressed.
Limitations and strengths
This study had some limitations. First, the search was restricted to literature published in English, potentially excluding relevant works published in other languages. Second, only critically ill adult populations were included, limiting generalisability to neonatal, paediatric, or non-critical care contexts. Third, although the search strategy focused on the term “digital twin”, studies describing DT-like systems under alternative terminology may not have been fully captured. Fourth, although preprints and conference proceedings were included to capture emerging evidence, the latter were summarised in a supplementary table rather than subjected to full analytical synthesis, given their preliminary nature and absence of full peer review. Lastly, as a scoping review, this study did not formally appraise methodological quality or risk of bias and therefore does not provide comparative judgements regarding study robustness.
Nonetheless, this study had several strengths. To our knowledge, this represents the first scoping review to comprehensively summarise DT applications in critical care, encompassing both patient-centred applications for critically ill populations and system-level implementations for care delivery optimisation. By synthesising diverse methodological approaches and clinical implementation strategies, this review provides a comprehensive overview of the current state of DT research in critical care and identifies key areas that require further empirical investigation.
Conclusions
This scoping review represents the first scoping review to systematically summarise DT applications in adult critical care. Across the 18 studies, DT research remained at an early developmental stage, with most relying on retrospective or single-centre data and lacking external, real-world validation. Nevertheless, this review identified potential for DTs across three key areas: enhancing clinical prediction and personalised treatment for patients, supporting decision-making and education for clinicians, and optimising operational efficiency for healthcare organisations. Although current applications are primarily targeted towards physicians, emerging work suggests opportunities for broader implementation across multidisciplinary healthcare teams. As the field matures, multicentre collaborations, expanded clinical use cases, strengthened data interoperability, and robust ethical governance will be necessary to facilitate the progression of DT systems from simulation-based models toward clinically evaluated tools.
Supplemental material
Supplemental material - Digital twin applications in adult critical care: A scoping review of current development and implementation trends
Supplemental material for Digital twin applications in adult critical care: A scoping review of current development and implementation trends by Yeonwoo Kim, Jiin Kim, Yeonju Kim, Mona Choi in Digital Health.
Supplemental material
Supplemental material - Digital twin applications in adult critical care: A scoping review of current development and implementation trends
Supplemental material for Digital twin applications in adult critical care: A scoping review of current development and implementation trends by Yeonwoo Kim, Jiin Kim, Yeonju Kim, Mona Choi in Digital Health.
Supplemental material
Supplemental material - Digital twin applications in adult critical care: A scoping review of current development and implementation trends
Supplemental material for Digital twin applications in adult critical care: A scoping review of current development and implementation trends by Yeonwoo Kim, Jiin Kim, Yeonju Kim, Mona Choi in Digital Health.
Footnotes
Acknowledgments
The authors acknowledge the research assistants YJ Choi and MS Lee, who helped organise and visualise the selected literature.
Ethical considerations
This article did not require ethical board approval because it did not contain human or animal trials.
Author contributions
All authors contributed to the study’s conceptualization and design. Yeonw K and Yeonj K performed the literature search, and Yeonw K and JK completed study selection and data extraction. Yeonw K, JK, Yeonj K conducted the formal analysis. All authors contributed to the interpretation of findings. Yeonw K and JK drafted the initial manuscript. Yeonj K and MC provided critical comments and substantive editorial feedback to the draft manuscript. All authors reviewed and approved the final manuscript. MC provided supervision and funding acquisition.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (No. RS-2022-NR069414) and the Brain Korea 21 FOUR Project funded by the National Research Foundation (NRF) of Korea, Yonsei University College of Nursing.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article.
Data Availability Statement
The dataset used and analysed for the present study is available upon reasonable request.
Guarantor
MC.
Supplemental material
Supplemental material for this article is available online.
