Abstract
Background
Autonomous medication delivery robots can streamline hospital logistics. However, their feasibility under elevator congestion remains uncertain.
Objective
To evaluate the feasibility of medication delivery robots in a tertiary hospital and quantify how the elevator operating rate (EOR, %) affects delivery success, delay, and user experience.
Methods
A prospective feasibility study was conducted in a tertiary hospital where a robot is used for delivering medicine. We analyzed 122 non-urgent missions from June 18–29, 2025, spanning weekdays and weekends. Data included the Elevator Operating Rate (EOR), passenger and cargo counts, Elevator Waiting Time, and Elevator Travel Time. The delivery outcomes were recorded, and a Monte Carlo simulation was used to model the failure probabilities under different congestion scenarios. The staff usability and workload were assessed using the System Usability Scale (SUS) and NASA Task Load Index (NASA-TLX).
Results
A Higher EOR was strongly associated with more delivery failures. Most failures resulted from physical obstruction by passengers or cargo. The data also confirmed that a high EOR coincided with greater elevator occupancy. Simulations incorporating space occupancy reproduced failure patterns similar to the in situ observations. An increased EOR also prolonged the delivery time. The staff reported relatively high usability, but the NASA-TLX scores indicated that frequent robot users felt greater time-related pressure, likely reflecting delays during congestion.
Conclusions
Autonomous medication delivery is feasible. However, its performance is sensitive to elevator congestion. Effective deployment requires consideration of elevator usage rates, and robotic medication delivery should be scheduled when congestion is below critical thresholds to ensure reliability and minimize the staff burden.
1. Introduction
In large-scale tertiary hospitals, the demand for timely and reliable internal logistics is increasing because of the complexity of operations and diversity of medical supplies. Traditionally, this process relies on human delivery staff transporting medications from central pharmacies or warehouses to wards using handcarts. In response to current logistic complexity within hospitals, some have adopted automation via pneumatic tube systems, small freight elevators, and ceiling-mounted rail transport. However, these solutions have notable limitations.1–5
Fragile or temperature-sensitive items, such as infusion solutions and glass containers, often require manual handling.6,7 Additionally, installation and retrofitting entail high costs and architectural constraints, which limit adaptability to existing infrastructures.
Autonomous indoor delivery robots offer a scalable alternative that leverages existing building layouts and requires minimal modification to existing infrastructure. 8 These robots can navigate hospital corridors, interface with elevators, and deliver medications along routes similar to those used by human delivery staff. 9
Several companies have already commercialized autonomous delivery robots, including ST Engineering Aethon’s TUG (USA) 10 ; Relay Robotics’ Relay (USA) 11 ; Academy of Robotics’ Milton Helper Bot (UK) 12 ; Pudu Robotics’ FlashBot (China) 13 ; Panasonic’s HOSPI (Japan) 14 ; LG Electronics’ CLOi ServeBot (Korea) 15 ; and DOGU Co., Ltd.’s IROI (Korea). 16 To standardize capabilities across the platforms, Relay Robotics introduced a classification framework for indoor delivery robots, analogous to automotive autonomy levels. 17 Under this framework, most hospital delivery robots today qualify as Level 3, capable of autonomous navigation with onboard sensing and dynamic obstacle avoidance. A subset demonstrated level 3+ by autonomously calling and boarding elevators. 17
However, translating Level 3/3+ capabilities into dependable day-to-day services remains challenging. Operational difficulties, such as building integrations, workflow disruptions, and physical interferences in shared spaces with passengers, must be addressed to achieve consistently reliable performance.
Elevator utilization is an important but insufficiently explored factor. Since delivery robots rely on shared elevators to access multiple floors, their performance can be significantly affected by elevator congestion. In practice, hospitals rarely have a dedicated elevator solely for robots, 18 making it essential to understand how varying the elevator operating rate (EOR) affects delivery success and efficiency. Therefore, research is needed to examine the feasibility of robot deployment in environments where robots and humans share elevator usage.
This study aimed to address this research gap by evaluating the feasibility of autonomous medication delivery robots in a tertiary hospital, with a specific focus on elevator utilization. We analyzed real-world data, including EOR, waiting times, and travel patterns, combined with simulation models to assess delivery failure probabilities under different congestion scenarios. Furthermore, we assessed user experience (System Usability Scale 19 ) and perceived workload (NASA Task Load Index 20 ) and developed a Monte Carlo model of occupancy-driven elevator blockage to explain the failure mechanisms. By integrating empirical observations with simulations, we provide actionable guidance for congestion-aware dispatch and operational planning in high-traffic hospital environments.
2. Methods
2.1. Study design and setting
This prospective, single-site, feasibility study was conducted at a high-traffic tertiary hospital (
Preliminary data on the EOR were collected prior to testing, confirming that utilization peaked during weekday clinical hours (09:00–17:00) and dropped below 50% during nighttime and weekends. Accordingly, delivery trials were scheduled to evenly cover both high-congestion (weekday daytime) and low-congestion (weekday nighttime and weekend) conditions.
The delivery workflow is illustrated in Figure 1(a)–(g). Briefly, the emergency department staff initiates a request via a web application and a pharmacist prepares the medicine delivery robot. The robot calls and boards an elevator, traversing the emergency department corridors to the destination. On arrival, the emergency nurse retrieves the medications from the robot. This end-to-end path leverages existing infrastructure and exposes the robot to ambient passenger and cart traffic, which is central to its feasibility. Autonomous medication delivery process. (a) An emergency department staff requests a delivery via web application. (b) The pharmacist activates the robot’s touchscreen interface. (c) The pharmacist loads medications into the robot’s storage drawer. (d) The robot boards the elevator; (e) The robot navigates the emergency department corridor. (f) The nurse retrieves delivered medications from the robot. (g) A schematic diagram of the end-to-end delivery route, with arrows indicating remote call, forward, and return paths.
2.2. Robot description
This study employed the
2.3. Task workflow by occupation
The integration of the delivery robot requires adjustments to existing medication delivery workflows across multiple occupations. Traditionally, nurses requested medication from ancillary support staff, who manually visited the pharmacy to collect prescriptions prepared by the pharmacists and deliver the medications to the wards.
When a robot is deployed, the overall workflow has to be modified to accommodate autonomous delivery. A nurse still initiates the medication request. However, instead of walking to the pharmacy, the staff dispatches the robot remotely. Upon the robot’s arrival, the pharmacist places the prepared medication in the robot’s storage drawer and initiates the delivery by pressing the “start delivery” button on the robot interface.
Upon arrival at the ward, the staff typically retrieves the medication and delivers it to a nurse. Occasionally, the nurse retrieves medication directly from the robot. The participants in each role were briefed on the new delivery process prior to the test.
This modified workflow preserved the existing hierarchical request chain while reducing the walking burden on the staff and allowing pharmacists and nurses to interact with the robot directly when necessary. This task division enabled a systematic feasibility assessment of including robots in hospital operations, revealed differences in occupational preferences for interacting with the system, and highlighted shifts in workload distribution among nurses, pharmacists, and support staff.
2.4. Data collection
We conducted feasibility testing from June 18 to June 29, 2025 (Asia/Seoul). Across this period, 135 delivery requests were logged. Per protocol, urgent medication runs were excluded (n = 13), yielding 122 eligible missions for analysis. To cover diverse temporal conditions, missions spanned both weekdays (n = 80) and weekends (n = 42), averaging 11.09 missions per active day. Because missions followed real clinical needs, within-day inter-mission gaps varied widely (from 6 to 1,044 minutes).
For each delivery trial, we logged the passenger headcount, number of cargo items, and whether the attempt ultimately succeeded. A trained operator accompanied the robot for safety (medication handling) and, at elevator entry, visually counted co-occupants and annotated cargo status on a standardized case-report form (integer co-occupant count; categorical cargo codes). Delivery success was defined as the robot completing the entire route and handing over the medication without any human intervention. In contrast, delivery failure was defined as the robot remaining stationary for > 1 min at any point or requiring manual assistance due to passenger/cargo obstruction. In each failure case, a specific failure type was annotated to facilitate the subsequent analysis. To minimize observer bias, the operator did not interact with or brief bystanders about the robot, recorded counts immediately at entry using the pre-specified coding scheme, and time-stamped all entries synchronized to the robot system clock. Handwritten entries were cross-checked against robot communication logs for timestamp consistency, and any discrepancies were reconciled by alignment. Because co-occupant counts are not available from the current platform’s on-board sensors, some residual observer bias may remain for these annotations. Nevertheless, the primary temporal variables—Elevator Operating Rate (EOR), Elevator Waiting Time (s), and Elevator Travel Time (s)—were derived from objective device logs, limiting the influence of any observational error on the main findings.
Operational definitions of elevator-related variables.
Elevator congestion (quantified by the EOR [%]) was the primary indicator of traffic conditions in this study and used to analyze its impact on delivery success. The EOR is defined in Equation (1) as follows: EOR and delivery performance. (a) Mean hourly EOR (%) by weekday; cell labels show means. (b) EOR distribution by delivery outcome (violin with median and quartiles; points = individual deliveries). (c) ROC for predicting success from EOR (AUC = 0.779; 95% CI 0.661–0.882). The shaded ribbon denotes the bootstrap 95% CI for the ROC curve; the red marker indicates the Youden-optimal threshold (59%).

For mission-level analysis, the EOR value corresponding to the hour in which the robot was called (mission initiation time) was assigned to that mission. For example, missions initiated between 13:00 and 14:00 were matched to the EOR calculated for the 13:00–14:00 interval. No rolling-window averaging across mission duration was applied.
2.5. Simulation modeling
A Monte Carlo simulation was developed to estimate delivery success probabilities under varying congestion levels. The elevator cabin (1.6 m × 2.15 m) was modeled as a two-dimensional rectangle, with the robot represented as a circle (radius = 0.3 m), passengers as circles (radius = 0.2 m), and cargo items as rectangles (0.4 m × 0.7 m). The cargo was first placed near the sidewalls under non-overlapping constraints, after which passengers were sampled from the remaining cells on a 0.10 m grid with minimum separations, with small uniform jitter-avoiding lattice artifacts. The robot advanced in 0.10 m steps toward a fixed boarding target, while passengers executed vertical/diagonal sidesteps, and the cargo shifted vertically when the robot came within a trigger distance
The simulation was designed to elucidate geometric mechanisms rather than to provide a fully parameter-exhaustive predictive model. Sensitivity analyses performed during model development indicated that moderate variations in key parameters (e.g., trigger distance, step size, or passenger sidestep allowance) did not materially alter the qualitative structure of the failure surface. In particular, the nonlinear escalation of failure probability with increasing passenger count and the amplifying effect of doorway-adjacent cargo were preserved across plausible parameter ranges. This robustness supports the use of the model as a mechanistic complement to the empirical EOR-based analysis, rather than as a finely tuned site-specific predictor.
2.6. Usability and workload assessment
User experience was evaluated in three occupational groups that had direct hands-on contact with the robot. 1. Pharmacists (n = 7), who interacted mainly while loading medications into the storage drawer. 2. A support staff (n = 7), who initiated most missions and retrieved the payload on arrival. 3. Registered nurses (n = 8), who engaged with the robot only when support staff were unavailable and a nurse had to collect the medications instead.
To assess perceived usability, we administered the SUS, 19 a 10-item questionnaire widely used to evaluate digital health interventions. Each item was rated on a 5-point Likert scale (1 = strongly disagree to 5 = strongly agree). Raw responses were converted using standard procedures: odd-numbered items were scored as (response – 1) and even-numbered items as (5 – response), resulting in a range of 0–4 per item. The sum (0–40) was multiplied by 2.5 to obtain a final SUS score of 0–100, where higher values indicate better usability.
Workload was assessed using the NASA-TLX, 20 which measures perceived burden across six dimensions: mental demand, physical demand, temporal demand (time pressure), performance, effort, and frustration. Each item is scored on an 11-point scale (0–10). The final TLX score was computed by weighting each raw score using participant-specific pairwise comparisons of subscale importance, yielding a composite score of 0–10, where a higher value indicated a greater overall workload.
Both questionnaires were completed after the entire four-week delivery-testing phase.
2.6.1. Survey administration
The SUS and NASA-TLX were administered in paper format as part of a standardized case-report form (CRF) packet within 48 h after the four-week phase concluded. No identifying information was recorded. Paper responses were manually transcribed into the study database using double data entry with adjudication of discrepancies.
2.7. Multivariable analysis
We prespecified three parsimonious logistic models for delivery failure (yes/no): M0 including Elevator Operating Rate (EOR, per 10% increment), M1 including the passenger count (per passenger), and M2 including both EOR and the passenger count. EOR represents background congestion, whereas the passenger count reflects momentary, within-elevator crowding. We report odds ratios (ORs) with 95% confidence intervals and assessed collinearity via VIF; full model specifications and outputs are provided in Table S3 (Supplementary Material 1).
Statistical analyses were performed in Python 3.11.8 (Windows 10, AMD64) using NumPy 1.26.4, pandas 2.3.3, SciPy 1.11.3, scikit-learn 1.3.2, Matplotlib 3.8.0, Seaborn 0.13.2, and statsmodels 0.14.5; random procedures used a fixed seed (SEED = 2025).
This study was conducted and reported in accordance with the STROBE guidelines for observational studies. The completed checklist is provided in Supplemental Material 2.
3. Results
3.1. Relationship between the elevator operating rate and delivery failures
To investigate the impact of elevator traffic on delivery outcomes, the EOR data corresponding to each delivery trial were retrieved and matched to each trial’s success or failure. Figure 2(b) illustrates the distribution of the EOR values across successful and failed deliveries. While successful deliveries were distributed over a wide range of EOR values, failures were heavily concentrated in the upper tail (median ∼90%), indicating that high elevator congestion markedly increased the risk of failure.
Using the Youden Index, 22 the optimal EOR cutoff was 59.01%, which best distinguished the successful deliveries from the failed attempts. Receiver operating characteristic (ROC) analysis at this threshold yielded an area under the curve (AUC) of 0.779 (95% CI, 0.661–0.882), indicating good discriminative performance. The overall success rate across all trials was 87.03%. When restricted to EOR < 59.01%, the success rate increased to 95.52%. Furthermore, because a subset of the remaining failures was unrelated to elevator congestion (e.g., corridor navigation errors), excluding these cases increased the estimated success rate to approximately 100%. Taken together, these results support the practical use of an EOR decision boundary of approximately 60% for autonomous dispatches.
In a prespecified multivariable model (M2) including both Elevator Operating Rate (per 10%) and the passenger count (per passenger), both predictors remained independently associated with delivery failure (EOR OR = 1.33, 95% CI 1.03–1.72; passenger count OR = 1.73, 95% CI 1.19–2.51; VIF ≈ 1.14; see Table S3, Supplementary Material 1).
3.2. Failure‐type classification and analysis
The 14 delivery failures were clustered into three failure types: (i) corridor self-drive (n = 4), (ii) elevator-call communication (n = 2), and (iii) elevator blockage during car entry/exit (n = 8). Corridor self-driving errors arose when the robot’s avoidance logic could not resolve tight trolley spacing, large static obstacles, or when traction was lost near the loading zone. Communication errors occurred immediately after the door opened when low wireless coverage induced a latency of 5–7 s, allowing doors to close before the controller–robot handshake, and aborting the mission. Elevator blockage errors were triggered when passengers or cargo carts obstructed the entry/exit path, preventing completion within timeout.
Figure 3(a) shows that elevator blockage events concentrated at a very high EOR (median = 97.3%), which is highlighted by a dashed ellipse and an annotation. In contrast, corridor self-drive errors tended to occur at a moderate EOR (median = 61.7%) and communication errors at a higher EOR (median = 78.5%). Figure 3(b) indicates a positive association between the EOR and co-occupant count (Pearson r = 0.352, 95% CI 0.186–0.499, p < 0.001). Consequently, the failure rate increased sharply once the passenger count exceeded three, and surpassed 90% when ≥ 6 occupants were present (Figure 3(c)). Classification of delivery failures and association with elevator congestion. (a) Distribution of EOR across failure types. (b) Positive correlation between EOR and the number of passengers (Pearson r = 0.352; 95% CI 0.186–0.499, p < 0.001). (c) Sharp increase in blockage-related failures when the passenger count exceeds three, and reaching >90% with six or more occupants.
3.3. Occupancy-based prediction of elevator boarding failure: Monte Carlo calibration and fit
Using the simulator described in Section 2.5, we estimated the elevator boarding failure probabilities over the occupancy configurations (
Because several high-occupancy cells in the empirical matrix were informed by a single observation, we applied Jeffreys-prior (α=0.5) smoothing,23,24 as shown in Equation (3), where
The optimal parameter set
The calibrated model achieved an MSE of 135.4 pp2 (RMSE ≈ 11.6 pp) against the smoothed empirical matrix. The side-by-side heatmaps in Figure 4(a) and (b) show close agreement in structure: failure probability is low when the elevator is lightly loaded and increases non-linearly from Occupancy-based prediction of elevator boarding failure and calibration. (a) Heatmap of measured failure rate (%) by the number of passengers and cargo items. (b) Simulated failure rate. (c) Failure case in simulation. (d) Success case in simulation.
Figure 4(c) and (d) illustrate these mechanisms using
Sparse empirical cells with single observations (e.g., unusual passenger–cargo combinations) deviate locally but are down-weighted during calibration; removing the weights or smoothing does not alter the qualitative shape of the surface. Overall, the calibrated model, driven solely by observable occupancy, captures the nonlinear escalation of boarding failure with crowding and explains it geometrically. Doorway-adjacent cargo and clustered passengers shrink the evasive space, precipitating blockades. This supports using the simulated surface as a practical policy aid (e.g., defer boarding for
Beyond the statistical calibration, the simulation reproduced the field-observed failure mechanisms. Specifically, doorway-adjacent cargo restricted lateral clearance, and clustered passengers reduced the evasive space patterns emerging from the geometric model and aligned with field observations. This supported the validity of the simulation as both a fitting tool and mechanistic account of congestion-induced boarding failures.
Importantly, these qualitative patterns were robust to reasonable perturbations of the simulation assumptions and calibration weights. While local deviations were observed in sparsely sampled occupancy cells, the overall topology of the failure surface—characterized by a sharp transition from feasible to infeasible boarding as occupancy increased—remained stable, strengthening confidence in the generalizability of the underlying mechanism.
3.4. Delivery time analysis: Delay patterns associated with elevator operating rate
An increase in the EOR was linked to a higher delivery failure risk, driven largely by more passengers and cargo during peak times. Delays remain a concern even in successful deliveries. We analyzed 108 completed trials, focusing on the elevator waiting time (s) from call at B2F to entry, and the elevator travel time (s) from entry to exit at 1F.
Figure 5(a) shows a moderate-to-strong correlation between EOR and elevator waiting time (r = 0.458, 95% CI 0.295–0.596, p < 0.001), particularly when the elevator was in transit rather than on standby. Figure 5(c) confirms that below 60% EOR, most calls found elevators on standby, whereas above this threshold, the EV state (in-transit) calls predominated. The impact of elevator utilization on delay in successful deliveries. (a) EOR (%) vs elevator waiting time (s). (b) EOR vs elevator travel time (s). (c) EOR distribution by EV state. (d) EOR distribution by travel mode. Shaded bands denote 95% confidence intervals for the regression fits.
Figure 5(b) shows a weaker correlation between the EOR and Elevator travel time (r = 0.224, 95% CI 0.037–0.397, p = 0.020) owing to the short two-floor distance. However, Figure 5(d) reveals that deliveries with travel mode (intermediate stop) consistently took longer and clustered in high-EOR periods (>70%).
Overall, a high EOR increased both the elevator waiting time and elevator travel time, primarily through more frequent EV states, both in-transit calls and travel mode (intermediate stop trips). These cumulative delays affected time-sensitive deliveries and reduced overall efficiency, imposing a non-negligible temporal burden even when the deliveries succeeded.
3.5. Comparison of user survey results by occupational group
The SUS scores varied significantly across the occupational groups (Figure 6(a)). Support staff achieved the highest median SUS score (∼77.5), outperforming both nurses (p = 0.004) and pharmacists (p = 0.044). Conversely, nurses had the lowest median score. This disparity reflected how the robot impacted workflows differently. For nurses and pharmacists, the impact introduced additional micro-tasks, such as walking farther to meet the robot, interacting with its touchscreen, and verifying contents. In contrast, the support staff previously responsible for physically collecting and transporting medications experienced a substantial reduction in manual workload, which likely contributed to higher self-reported usability ratings. Usability and workload by occupational group. (a) SUS (0–100) by group. (b) SUS item means (0–4), staff vs non-staff (nurses + pharmacists). (c) Overall NASA-TLX weighted score (0–10) by group. (d) NASA-TLX dimensions (0–10), staff vs non-staff. Bars/points show group summaries. Error bars indicate variability across respondents.
The item-level analysis (Figure 6(b)) compared the support staff to a combined non-staff group (nurses and pharmacists). The staff scored higher on nearly all items, with statistically significant differences (p < 0.05) in four aspects, namely, Item 3 (“easy to use”), Item 7 (“learn quickly”), Item 8 (“cumbersome to use” – reverse scored), and Item 9 (“confidence in using the system”). These findings suggest that staff found the system intuitive, competence was achieved quickly, leading to greater user confidence. The only item where the staff scored slightly lower was Item 5 (“functions well integrated,” p = 0.33), which likely reflected a dual-interface workflow (web-based call + on-device touchscreen) that reduced perceived integration.
The NASA-TLX workload analysis (Figure 6(c)) revealed no significant overall differences among the three occupational groups. All three groups reported moderate workload levels, suggesting that the robot did not substantially increase or decrease the total perceived burden. The sub-dimension analysis (Figure 6(d)) showed that the staff group had slightly lower scores on most subscales than the non-staff group (lower mental and physical demand), but marginally higher temporal demand, plausibly linked to high-EOR periods. During peak congestion, longer elevator waiting times and elevator travel times (Section 3.4) imposed time pressure on personnel meetings or the loading of the robot. As a result, although the manual transport burden decreased, temporal stress persisted under high elevator utilization, underscoring the need to optimize delivery efficiency (e.g., congestion-aware dispatch) during high traffic periods.
4. Discussion
4.1. Principal findings and interpretation
This study clearly demonstrated that EOR strongly influenced delivery success, thus directly addressing the research question. A higher EOR correlated with a greater likelihood of delivery failure, and the threshold-based classification proved effective. The ROC analysis (Figure 2) identified an optimal cutoff value of 59.01% (Youden’s Index), with an AUC of 0.779, indicating good predictive performance. Deliveries below this threshold achieved success rates of approximately 95.5%. In contrast, high EOR periods showed disproportionate failures. Thus, managing dispatches to avoid peak congestion periods can substantially improve reliability.
Failure pattern analysis (Figure 3) revealed that “elevator blockage” errors—when passengers or carts obstructed boarding or exiting—occurred almost exclusively during times of extreme congestion (often >90% EOR), with failure rates approaching 100% in fully occupied cars. The Monte Carlo simulation (Figure 4) reproduced these patterns, showing a nonlinear escalation of failures as passenger and cargo counts increased. The close fit of the simulation to the empirical data supports a spatial collision mechanism: dense occupant clustering and doorway-adjacent cargo reduce clearance, leaving the robot unable to maneuver.
Even when deliveries were successful, a high EOR degraded the efficiency. As shown in Figure 5, the elevator waiting time (s) increased significantly with the EOR (r = 0.458, p < 0.001), especially when the elevator was already in transit and made multiple stops before arrival. The elevator travel time (s) showed a weaker but noticeable increase, particularly in cases with an intermediate stop travel mode. These operational delays did not always result in mission failure, pose risks to time-sensitive tasks, or contribute to workflow inefficiencies.
The user experience data (Figure 6) aligned with these operational findings. While the overall usability (SUS) remained high among the support staff, the NASA-TLX results indicated a slightly higher temporal demand for frequent robot users during congested periods. This suggests that slow or delayed deliveries under high-EOR conditions created perceived time pressures, even when the missions were successfully completed.
In summary, the EOR is a decisive determinant of autonomous robotic delivery performance. A congestion threshold of approximately 60% effectively distinguishes low-from high-risk conditions, with substantial impacts on both technical outcomes and staff workload. Maintaining an EOR below this cutoff through scheduling or adaptive control strategies can maximize delivery success and minimize operational strain.
4.2. Comparison with prior work and contributions to the field
Prior studies established the technical feasibility of elevator-capable hospital robots under typical conditions (e.g., alignment, car entry/exit, and floor traversal). Reports on modular navigation and deployments and logistics-oriented analyses often treated elevator availability as static or exogenous.25–28 Building-level multi-story routing studies likewise solved K-shortest paths on hierarchical graphs, while ignoring elevator congestion and elevator microstates. 29 On the contrary, this study addressed congestion—quantified as EOR—from background noise to central operational variable. The strengths of this study include (i) the use of continuous EOR measurements from real-world telemetry, (ii) identified a non-linear EOR–failure relationship, and (iii) demonstrated quantify efficiency penalties using the elevator waiting time and, to a lesser extent, the elevator travel time.
Moreover, the results highlighted the use of interpretable elevator microstates to explain latency pathways. In this study, we categorized calls by EV state (standby vs. in transit) and travel mode (direct vs. intermediate stop), thus linking macroscopic congestion to proximate mechanics of delays, and yielding implementable rules (e.g., defer dispatch if in transit and EOR ≥ 60%).
Furthermore, while stochastic routing/scheduling work is largely simulation-only, 30 we combined field measurements, ROC-derived thresholds, and occupancy-based Monte Carlo simulations to provide convergent evidence for a geometric occlusion mechanism that is generalizable via common telemetry proxies (door/weight sensors and car-count logs). Finally, our quantification complemented observational reports of “incomplete autonomy” 31 by specifying when and why interruptions concentrated by means of high-EOR windows with doorway-adjacent cargo and clustered passengers.
Collectively, these advances shift the focus from demonstrating raw elevator-riding capability to defining the operational conditions under which autonomous dispatch is appropriate, given measurable congestion. This reframing has immediate implications. In future, research studies should report the EOR as a standard covariate for standardization and comparative purposes. Practically, hospitals can operationalize a tunable ∼60% cutoff with microstate-aware rules using signals already available in building systems. Taken together, these findings provide a practical foundation for the congestion-aware deployment of autonomous medication delivery systems in multi-floor hospitals.
4.3. Operational implications and deployment guidance
The findings of this study provide three implications for daily operations. First, the fleet manager should use the Elevator Operating Rate (EOR) as a control signal rather than a retrospective metric. Encode the ROC cutoff (∼59.01%) as a dispatch rule: below the boundary use standard dispatch; at or above it, enable congestion mode, defer and bundle non-urgent jobs, and escalate to human couriering if penalties exceed targets.
Second, dispatch predictions should incorporate elevator microstates to improve prioritization. The EV State (Standby vs. In Transit) and the Travel Mode (Direct vs. Intermediate Stop) jointly predict near-term Elevator Waiting Time (s) and Elevator Travel Time (s). Under high EOR with EV State: In Transit and Travel Mode: Intermediate Stop, the system should resequence routes or stage robots on floors with better access, 32 whereas under moderate EOR with EV State: Standby, the system should opportunistically clear queued tasks.
Third, hospital logistics policy and the user interface should embed congestion awareness by design. Pharmacy and ward teams should define urgency tiers (e.g., STAT, urgent, routine) with explicit allowances for elevator latency, and the dispatch UI should expose predicted EOR, EV State, and ETA ranges while committing to arrival windows rather than point estimates. When Elevator Waiting Time (s) exceeds a dynamic hour-of-day threshold, the system should auto-escalate to human couriering and record the handoff to preserve the chain of custody.
4.4. User experience interpretation and human factors
Two points summarize the user results. SUS scores were high, indicating that the system was easy to learn and operate. Yet high EOR and longer Elevator Waiting Time (s) still produced temporal burden, even when missions completed successfully. Ease of use can coexist with time pressure when delays arise from elevator congestion rather than the interface.
Importantly, this temporal burden appears to be shaped by role-specific workflow expectations. Staff members who frequently coordinated robot dispatch and receipt often remained in a “waiting mode” during high-EOR periods, creating a sense of time pressure despite minimal physical effort. In contrast, nurses and pharmacists, whose primary clinical tasks were less tightly coupled to the robot’s arrival time, reported lower temporal demand even when overall usability was perceived as lower.
Prior HRI findings align with this pattern. Users may value correct robot behavior but feel dissatisfied if outcomes violate expectations or lack timely feedback. 33 When anticipated arrival times are uncertain or extended without explicit explanation, users may perceive loss of control over their workflow, amplifying temporal demand. Delays without clear status updates increase perceived workload. Transparency and predictability therefore become central during congestion.
Tracking human factors alongside technical metrics can close the loop. Monitoring SUS and NASA-TLX temporal demand together with EOR, Elevator Waiting Time (s), and Elevator Travel Time (s) enables targeted adjustments. In practice, congestion-aware dispatch, microstate-informed prioritization, and explicit temporal feedback help preserve usability while reducing perceived time pressure in high-traffic settings.
4.5. Limitations
Several constraints should be considered when interpreting these results. First, the study was conducted at a single tertiary hospital using one commercially available robot platform. Building geometry, elevator bank layout, passenger profiles, and organizational practices differ across sites. 34 Accordingly, while the proposed analytical framework—combining Elevator Operating Rate (EOR) quantification with elevator microstate analysis—is broadly generalizable, the specific ROC-derived threshold identified in this study (59.01% EOR) should be interpreted as site-specific and is expected to require recalibration when applied to other hospital environments.35,36 In particular, operational characteristics such as elevator cabin dimensions, the use of transport carts and their physical size, and the composition of elevator occupants (e.g., staff-only elevators versus elevators shared with patients who may have limited familiarity with robotic systems) may substantially influence congestion dynamics and the effective EOR threshold. This limitation reflects not a weakness of the approach itself, but rather the contextual dependence inherent to shared-infrastructure systems such as hospital elevators.
Second, the number of delivery missions and respondents was modest. Although the key effects were consistent in direction, some between-group comparisons, especially in user-reported outcomes, did not reach statistical significance (p > 0.05). Therefore, type II errors cannot be excluded. Third, the Monte Carlo model simplifies several real-world phenomena. Parameters, such as the evasive movement length and collision trigger distance, were tuned to best fit the study setting, and the model did not explicitly incorporate variable door-dwell logic, car acceleration profiles, emergency overrides, or atypical human behaviors (e.g., riders holding doors for prolonged periods). These omissions likely explain the residual error at the extreme tail of congestion and point to the need for layout- and controller-aware modeling in future studies. 36 Finally, the study focused on single-robot operations and did not measure longer-term adaptation by either staff or the fleet manager. Multi-robot coordination, cross-bank interference, and cross-departmental demand shaping were beyond the scope of this study, but could materially alter congestion patterns.34,37
When true multi-site trials are infeasible in the near term, a complementary strategy is to use hospital digital twins to stress-test generalizability before field deployment. Space-aware data integration can reproduce site-specific geometry, traffic composition, and elevator controller policies to evaluate whether the form of the EOR–failure relationship and the associated blockage mechanisms persist under varied layouts, even as absolute threshold values shift. 38 Likewise, simulation-based digital twin pipelines can stage counterfactual scenarios, policy sweeps, and “shadow-mode” validations that combine live telemetry with in-silico experimentation. 39 Such models cannot fully replace comparative, multi-site studies, but they can prioritize configurations, reduce external-validity risk, and de-risk prospective trials by quantifying expected Elevator Waiting Time (s), Elevator Travel Time (s), and failure modes under diverse operating conditions.
4.6. Future directions
Future work should broaden external validity and advance mechanism-aware control. A multi-site, multi-vendor program should test whether the ∼60% EOR decision boundary and the observed microstate effects can be generalized across different elevator controllers and traffic regimes. Within each site, staged rollouts can be used to compare congestion-aware dispatch with the current dispatch method using prospective randomized or quasi-experimental designs. Co-primary outcomes should be delivery success and elevator waiting time (s), and secondary endpoints should include human factor measures (SUS, NASA-TLX Temporal Demand).
Methodologically, richer telemetry can improve prediction and control. Real-time car occupancy estimation from weight sensors and door cycle counters combined with floor-to-floor elevator travel time (s) histories can provide more accurate forecasts of the EV state (standby vs. in-transit) and travel mode (direct vs. intermediate stop). Learning-based prioritization policies can then minimize expected waiting under service-level constraints while retaining interpretable guardrails derived from the ROC analysis. 40 Extending the Monte Carlo model to include door-dwell controllers, passenger intent, and robot–human courtesy rules would better capture extreme congestion dynamics and inform safer boarding policies.
Both experiments were performed immediately. First, evaluate multi-robot scheduling that explicitly manages elevator contention—for example, staggering calls or short, time-boxed car reservations during EOR spikes—and quantify spillover effects on non-robot users. 41 Second, incorporate medication criticality into dispatch so that high-priority items are routed to low-EOR windows, alternative banks, or manual couriering with explicit escalation logic; telemetry-driven occupancy inference and “smart elevator” signals can enable these decisions in real time. 42 Finally, longitudinal audits should test whether staff temporal demand decreases after deploying transparency features (EOR display, microstate-based ETAs) and whether policy updates sustain performance as building traffic patterns evolve.
5. Conclusion
This feasibility study confirmed that autonomous medication delivery robots can operate reliably in a multi-floor tertiary hospital. The performance is tightly conditioned by elevator congestion. EOR emerged as a first-order determinant, and ROC analysis identified an operational cutoff near 59.01%, below which delivery success approximated 95.5%, and above which failure was likely to rise sharply. The dominant failure mechanism was elevator blockage, that is passengers or carts restricting doorway clearance. Monte Carlo modeling, parameterized by physical occupancy, reproduced this nonlinear escalation, supporting a geometric occlusion pathway rather than sensing or software faults.
Congestion also reduces the efficiency of successful missions. A higher EOR was associated with longer elevator waiting time (s) and, to a lesser extent, elevator travel time (s), especially when the elevator’s EV state at call was “in transit” and the subsequent travel mode was “intermediate stop”. These patterns explain why users reported high SUS scores, yet experienced elevated temporal demand during congested periods.
Collectively, these results shift emphasis from raw technical feasibility to operational viability under measurable infrastructure states. Treating the EOR as a control signal and leveraging the elevator microstates (EV state and travel mode) provide a practical basis for congestion-aware dispatch. Hospitals can integrate real-time EOR monitoring, surface EV states, and travel modes into dispatch interfaces and apply adaptive scheduling, routing, and escalation policies to maintain reliability while limiting staff workload.
These recommendations are immediately actionable and scalable. The required inputs are available from standard building telemetry (e.g., door cycles, car counts, and weight sensors), and control rules are interpretable and tunable to local risk tolerance. Although multi-site, multi-vendor validation remains important, adopting congestion-aware strategies can improve delivery success, shorten waiting times, and meet the timing constraints of clinical workflows in high-traffic environments.
Supplemental material
Supplemental material - Feasibility of autonomous medication delivery robots considering elevator utilization in high-traffic hospital environments
Supplemental material for Feasibility of autonomous medication delivery robots considering elevator utilization in high-traffic hospital environments by Yourack Lee, Song-Ee Kim, Jeong Su Kim, Hyeong Guk Son, Sung Hyeon Lee, Hyung-joo Lee, Hyobeen Jeong, Jin Woo Lim, Sug Young Park, Yong Joo Lee, So Jung Kang, Chanho Park, Jung Boone Kim, Minsun Kim, Jang Pyo Yu, Quoc Huy Tran, Jun Seo Park, Il-Ho Park in Digital Health
Supplemental material
Supplemental material - Feasibility of autonomous medication delivery robots considering elevator utilization in high-traffic hospital environments
Supplemental material for Feasibility of autonomous medication delivery robots considering elevator utilization in high-traffic hospital environments by Yourack Lee, Song-Ee Kim, Jeong Su Kim, Hyeong Guk Son, Sung Hyeon Lee, Hyung-joo Lee, Hyobeen Jeong, Jin Woo Lim, Sug Young Park, Yong Joo Lee, So Jung Kang, Chanho Park, Jung Boone Kim, Minsun Kim, Jang Pyo Yu, Quoc Huy Tran, Jun Seo Park, Il-Ho Park in Digital Health
Footnotes
Acknowledgments
We thank the hospital staff and technical team for their cooperation.
Ethical considerations
This study was approved by institutional review board (IRB) of the Korea University Guro Hospital (IRB No. 2025GR0445).
Author contributions
Conceptualization: Hyung-joo Lee, Hyobeen Jeong, Jin Woo Lim, Sug Young Park, Yong Joo Lee, So Jung Kang, Chanho Park, Jung Boone Kim, and Minsun Kim. Methodology: Yourack Lee and Jang Pyo Yu. Investigation: Yourack Lee, Song-Ee Kim, and Jeong Su Kim. Formal analysis: Yourack Lee. Software: Jang Pyo Yu, Quoc Huy Tran, and Jun Seo Park. Resources: Hyung-joo Lee, Hyobeen Jeong, Jin Woo Lim, and Sug Young Park. Writing—original draft: Yourack Lee. Writing—review and editing: Yourack Lee, Hyeong Guk Son, and Il-Ho Park. Funding acquisition: Sung Hyeon Lee. Supervision: Il-Ho Park.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the Korea University Guro Hospital (KOREA RESEARCH-DRIVEN HOSPITAL) (No. O2513551) and the Korea Institute for Robot Industry Advancement (KIRIA) under the 2024 Regulatory Innovation Robot Demonstration Program.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets used and/or analyzed in the current study are available from the corresponding author upon reasonable request.
Declaration of generative AI and AI-assisted technologies in the writing process
The authors used ChatGPT 4o and 5 as language editing tools to support the writing process. The authors carefully reviewed and edited the content generated and take full responsibility for the content of the published article.
Supplemental material
Supplemental material for this article is available online.
