Abstract
Objective:
To evaluate operator workload and usability of a dual-arm teleoperated robotic platform designed for solo-physician outpatient otorhinolaryngology (ENT) procedures.
Methods:
A dual-arm system integrating a flexible endoscope manipulator (EndoBot) and an instrument manipulator (ToolBot) was developed and controlled via dual joysticks and a multi-view interface. Five otolaryngologists performed simulated nasal and oral procedures on an anatomical phantom and completed standardized workload (NASA-Task Load Index [TLX]) and usability questionnaires.
Results:
All scheduled trials were completed successfully. Workload was higher for nasal than for oral tasks (NASA-TLX total score, sum of six 0-10 subscales; nasal: 44.0 vs oral: 22.2), which may reflect the narrower nasal workspace and more frequent view management. Key limitations included conservatively-restricted motion speed, limited depth perception in multi-view teleoperation, and timing metrics defined from entry pose to task completion (excluding full setup/docking steps).
Conclusion:
These preliminary phantom-based findings characterize early-operator workload and interface constraints of a dual-arm teleoperated system for outpatient ENT procedures. The results provide hypothesis-generating signals to guide system refinement. Future patient-based, head-to-head studies are required to assess full workflow performance and clinical applicability.
Introduction
Otorhinolaryngology (ENT) outpatient procedures require precise bimanual control of an endoscope and instruments within narrow anatomical spaces such as the nasal cavity, pharynx, larynx, and external auditory canal, increasing procedural complexity and limiting workflow efficiency. 1 Moreover, the close proximity between physician and patient during these procedures poses an elevated risk of infection transmission, which became a critical concern during the COVID-19 pandemic.
Recent advances in medical robotics have expanded their use across surgical fields, including otolaryngology (eg, transoral robotic thyroidectomy).2,3 However, their application to outpatient ENT procedures remains limited. Existing robotic or assistive platforms have focused on isolated functions such as endoscope navigation or single-instrument manipulation,4,5 which do not reflect the integrated bimanual coordination required in routine ENT practice. Additionally, advanced multi-channel therapeutic endoscopes 6 remain costly and technically demanding, hindering widespread adoption. To address these limitations, we developed a dual-arm robotic system designed specifically for outpatient ENT procedures.7 -10 The platform integrates an endoscope arm for stable visualization and a tool arm for therapeutic manipulation, allowing a single physician to perform various procedures without direct assistance. This design aims to enhance procedural independence, reduce infection risk, and improve overall workflow efficiency.
Evaluating the feasibility of such a system requires more than simple task completion. Traditional usability scales for teleoperation and surgical interfaces do not adequately reflect the cognitive and ergonomic challenges of the developed outpatient system.11 -14 Therefore, a structured workload assessment tool was developed to measure surgeons’ physical and mental effort, integrating established metrics with additional items tailored to this system’s operational characteristics.
This exploratory pilot study aimed to evaluate the system’s initial feasibility and operator workload in simulated outpatient ENT procedures, with the primary goal of generating design-informing and workflow-informing data to support subsequent controlled and patient-based studies. Using a developed workload assessment, we analyzed task performance and surgeon feedback across representative nasal and oral task sets. The findings provide preliminary feasibility and usability signals for single-physician robotic ENT procedures and lay the groundwork for future patient-based evaluation.
Materials and Methods
Robotic System Design and Evaluation Setup
This study was initiated as a collaborative project between the medical staff of Hanyang University Hospital and the College of Engineering of Hanyang University ERICA campus. The primary goal was to develop and evaluate a robotic system capable of enabling remote ENT procedures in a simulated outpatient environment, allowing physicians to perform examinations or minor interventions without direct physical proximity to the patient. Real-time endoscopic and environmental video streams are displayed at the control station, and the operator controls both the flexible endoscope and instruments using 2 joystick controllers (Logitech 3D Pro), allowing bimanual teleoperation without an in-room assistant (Figure 1 and Supplemental Material).

GUI displaying the 4 synchronized views: the main endoscopic view, and 3 auxiliary views (tool-tip view, frontal face, and side face) arranged in the left column. GUI, graphical user interface.
The endoscopic manipulation unit, termed EndoBot, was developed to remotely control a flexible fiberoptic endoscope (Olympus ENF-P4) with stable precision (Figure 2A). It integrates a custom-designed actuation module 15 that allows 3 essential motions of the endoscope: tilting of the distal tip, rotation of the tip, and controlled advancement or withdrawal of the flexible shaft. The “ToolBot” serves as the instrument-handling unit of the system (Figure 2B). It is designed to remotely operate conventional ENT instruments such as suction tips, forceps, and swabs. A camera attached near the instrument tip provides a close-up view that assists remote manipulation in narrow anatomical spaces. The multi-view configuration combined endoscopic, frontal, lateral, and tool-tip perspectives. Together, these features allow the operator to perform basic procedural tasks, such as suction, biopsy, or swabbing, without assistance (Figure 2C).

Experimental hardware setup at the follower site. (A) The dual-robot system (EndoBot and ToolBot) and anatomical phantom torso. (B) Side view illustrating the ToolBot-mounted gripper during a nasal swab procedure. (C) The corresponding intra-corporeal endoscopic view of the target region and inserted swab.
To assess system feasibility, we conducted phantom-based experiments with 5 otolaryngologists (2 specialists with >5 years of experience and 3 residents). A realistic upper-torso anatomical phantom with detailed nasal and oral, pharyngeal cavity structures simulated outpatient conditions (Figure 3). At the follower site, the phantom was seated on an examination chair equipped with adjustable chin and forehead supports to emulate routine clinic positioning. The chair height was adjusted so that the phantom’s nasal entrance was ~1.4 m above the floor. The head was oriented to face forward with slight extension induced by the chin/forehead rest, providing a stable and repeatable posture across trials. This setup was designed to approximate typical patient posture and head stabilization during outpatient ENT examinations.

Anatomical view of used phantom. (A) Oral cavity. (B) Larynx (vocal folds) accessed via the hypopharynx. (C) Nasal cavity.
From the remote-control station, participants performed 2 task sets. The nasal set consisted of suction, swab manipulation, and forceps-based manipulation under flexible endoscopic visualization; the forceps task simulated biopsy by grasping a predefined target region on the phantom. The oral set consisted of a continuous exploratory transoral inspection from the oral cavity to the vocal folds, followed by simulated forceps task and oral suction. Each physician performed 3 repetitions for each task set; the inspection was performed as a single continuous sequence and counted as 1 trial. Execution time was defined as the interval from a standardized entry pose (outside of the nostril or oral opening) to task completion, encompassing both approach and simulated manipulation phases. Setup and docking were not included because the flexible endoscope remained pre-mounted on the EndoBot throughout the session; full outpatient workflow timing (including setup/draping and docking steps) should be quantified in future patient-based studies.
Specialized Questionnaire Development and Metrics
To effectively evaluate the feasibility of robotic systems, we developed a structured multi-domain questionnaire informed by established frameworks (Table 1). ENT specialists and engineers collaboratively developed the questionnaire to assess difficulty, workload, usability, and core feedback on clinical applicability. The questionnaire integrates NASA-Task Load Index (TLX) 16 for workload assessment and partially-adapted components from the Post-Study System Usability Questionnaire (PSSUQ), TAM, and UTAUT17 -20 for usability and technology acceptance. Situation awareness items were derived from Endsley’s 3-level model 21 (perception, comprehension, projection) and adapted for multi-view teleoperation. Additional questions assessed prior interface experience and ergonomic aspects of the joystick-based control.
Physician Questionnaire With Subjective Items and Objective OPI Metrics.
Abbreviations: CP, conventional procedure; OPIs, objective performance indicators; PSSUQ, Post-Study System Usability Questionnaire; PUI, physical user interface; RP, robotic procedure; SART, Situation Awareness Rating Technique; TLX, Task Load Index.
The final instrument comprises 5 sections (A-E). Section A (pretrial) evaluates baseline difficulty with conventional methods (A-1), initial understanding of the system (A-2), and technology acceptance (A-3). Sections B to E are completed after robotic manipulation: Section B covers workload and usability, Section C covers interface and control ergonomics, Section D covers perceived clinical applicability, and Section E covers overall evaluation with task difficulty ratings and open-ended comments regarding advantages and limitations. 22
The questionnaire was administered sequentially. Participants completed Section A before using the robot. After performing the robotic nasal and oral tasks, they finally completed Sections B to E immediately after each task to minimize recall bias. All items used a 10-point scale: for the difficulty and workload domains, lower scores indicated lower difficulty or workload. For usability/acceptance and other items, lower scores generally reflected more favorable evaluations (eg, stronger agreement or better perceived usability), unless explicitly reverse-coded.
In parallel with subjective data, we logged 2 objective performance indicators (OPIs) 23 : (1) total execution time, including the approach and manipulation phases and (2) camera-switching frequency as an indicator of attention shifts. This combined instrument was used as an exploratory assessment tool and was not intended to serve as a formally-validated psychometric scale.
Results
Five otolaryngologists performed nasal and oral examinations. Their experiences were evaluated using a custom questionnaire (Table 1); responses are summarized in Table 2. Additionally, the logged OPI metrics are summarized in the latter part. All scheduled trials were completed successfully, and no re-docking events were encountered during the pilot sessions; troubleshooting events were not separately instrumented.
Summary of Subjective (Questionnaire) and Objective (OPI) Measures.
Pretrial Assessment (Section A)
Pretrial assessments indicated that participants found conventional procedures to be generally easy to perform (oral: 1.6 ± 0.9; nasal: 3.0 ± 2.7). Initial technology acceptance (Section A-3) was favorable. Participants showed positive attitudes toward the system’s potential usefulness, particularly in improving procedural quality (5.2 ± 2.28) and productivity (3.6 ± 2.19). They also agreed that the system could be “more useful than conventional methods” (4.8 ± 2.68). Perceived social influence and facilitating conditions were moderately positive, indicating that participants felt institutional and technical support were adequate, though additional training would be beneficial.
Post-Task Evaluation (Sections B-E)
Post-task workload assessments (TLX, Section B-1) revealed a clear difference between the 2 procedures. The nasal tasks imposed a substantially-higher overall workload (7.33 ± 2.17) than the oral tasks (3.7 ± 2.02). Among the 6 workload dimensions, effort (9.00 ± 1.73) and performance (8.00 ± 1.41) were rated highest for the nasal procedure, reflecting greater difficulty and technical demand in the narrower workspace. In contrast, the oral task workload was rated as moderate.
System usability evaluated by the PSSUQ (Section B-2) showed mixed but generally-positive results. Participants rated the system as easy to learn (1.20 ± 1.10) and intuitive to operate. Situation awareness, measured by the Situation Awareness Rating Technique (Section B-3), was rated as moderate. In particular, the “realism of the video field” received the least favorable ratings, as participants reported difficulty adapting to multiple camera perspectives and perceiving spatial depth and 3-dimensional orientation. Meanwhile, the physical interface ergonomics (Section C) were rated favorably – intuitive layout (1.60 ± 0.89) and low operator fatigue (4.40 ± 3.29, reversed scale) were noted as distinct strengths of the system.
In Section D, physicians reported increased operator uncertainty associated with limited nonverbal feedback during teleoperation, indicating a perceived limitation in assessing patient cues through remote interaction (1.2 ± 1.10).
Overall Evaluation and Feedback on the Robotic System
The comparative difficulty assessment (Section E-1) revealed distinct differences between the 2 procedures. The robotic oral examination (2.20 ± 0.84) was rated as only slightly-more difficult than its conventional examination (1.60 ± 0.89, Section A-1). In contrast, the robotic nasal examination (8.60 ± 2.19) was perceived as significantly-more difficult than the conventional procedure (3.00 ± 2.83, Section A-1). The increased difficulty in the nasal task may reflect the narrow working space and frequent need for camera switching.
Qualitative feedback from participants is summarized in Section E-2. All 5 physicians in this pilot highlighted the potential of the system for noncontact or remote examinations, particularly in situations involving infectious patients. In addition, they recognized its potential educational value, noting that the system could be effectively used to demonstrate endoscopic anatomy and instrument handling to nonspecialist trainees. However, every participant also pointed out significant limitations, most notably the slow operational speed and latency of the robotic response, which were regarded as major obstacles to clinical application.
Objective Performance Summary
In this pilot dataset, the nasal cavity procedure demonstrated a longer execution time (271.3 ± 67.1 seconds) than the oral procedure (105.3 ± 47.5 seconds). Camera-switching frequency was higher in the nasal task (2.6 ± 0.9 switches) than in the oral task (1.9 ± 0.54 switches).
Discussion
Outpatient ENT procedures require close proximity and constant assistance, increasing infection and staffing burdens. To address these challenges, we developed a teleoperated dual-arm robotic system over 5 years (2021-2025). The system evolved into an outpatient-oriented prototype integrating a dual-joystick console, multi-view interface, and compact robotic arms optimized for outpatient ENT examinations. Unlike operating-room robotic systems designed for anesthetized patients with extensive setup requirements, this platform was purpose-built for outpatient examinations by a single physician. Furthermore, through collaboration with clinical and design experts, the examination-room layout, console configuration, and camera arrangement were refined to simulate a realistic clinical workflow.24 -27 Importantly, the findings of this study should be interpreted as design-informing and workflow-informing observations rather than as evidence of clinical benefit or readiness.
This study is distinctive in that it utilized a fully-self-designed and custom-built robotic platform specifically developed for ENT applications. Figure 4 illustrates an example of the teleoperation environment, visualized through the metaverse-based educational simulation platform developed as part of the study. The evaluation was conducted directly by otolaryngologists, using a custom questionnaire to assess operator-perceived feasibility and user experience in a simulated setting. Such instruments may serve as a practical framework for evaluating usability and workload in future interface-centered medical system studies.

Example of the teleoperation scenario visualized within the metaverse-based educational simulation environment developed for this study.
In comparison with standard-of-care tools, outpatient endoscopy is typically performed using rigid or flexible endoscopes, and channeled endoscopes can provide both visualization and instrumentation through a working channel. In this pilot, we intentionally used a flexible endoscope for visualization while instruments were handled by a separate robot arm. This design prioritizes teleoperation safety and versatility by minimizing rigid elements near the patient, leveraging the broader anatomical reach of flexible endoscopes, and simplifying the outpatient workflow by standardizing on a single endoscope type. Compared with channeled endoscopes, this approach utilizes widely-available conventional flexible scopes with minimal additional hardware, which may reduce cost and facilitate deployment across diverse outpatient environments.
Participants found the system intuitive and easy to learn, particularly those familiar with joystick-based interfaces. Workload assessments confirmed that nasal procedures imposed a substantially-higher burden than oral tasks, primarily due to anatomical constraints and frequent camera switching. Nevertheless, because NASA-TLX total scores are calculated as the sum of six 0 to 10 subscales (maximum 60), the observed totals (nasal: 44.0; oral: 22.2) should be interpreted as relative workload signals rather than as meeting any predefined “acceptable” threshold. The main value of this pilot is the consistent nasal-versus-oral workload difference and the qualitative feedback that identifies specific drivers of workload (restricted speed, depth perception, and view management). Participants with prior gaming or driving experience tended to report lower workload and difficulty levels, implying that familiarity with similar control paradigms may facilitate adaptation more effectively than clinical seniority alone.
Although this study directly involved physician operators, the current model still represents the first prototype. As such, several limitations and valuable feedback were identified throughout the evaluation process. The main usability limitation was the conservatively-restricted motion speed (5 mm/second), which was perceived as too slow for efficient task execution in the simulated setting. Though the ease of learning was rated favorably, overall satisfaction was moderate, indicating that further optimization of motion control and procedural efficiency is necessary. Visual feedback and situation awareness were identified as additional areas for improvement. While multi-view management itself was not problematic, the limited realism of the video field and reduced depth perception hindered spatial orientation, especially during nasal insertion. These findings emphasize the importance of optimizing camera angles and improving visual depth cues to enhance 3-dimensional perception during teleoperation.
Participants also emphasized the importance of human factors. Increased operator uncertainty due to limited nonverbal feedback (Section D) highlights the need to improve interaction design to support operator confidence.
Limitations
This study has several important limitations. First, the evaluation was conducted in a phantom environment rather than in real patients, and therefore, no patient-reported outcomes or clinical endpoints were collected. In addition, the combined questionnaire used in this study has not undergone formal psychometric validation (eg, reliability or construct validity testing) and was applied as an exploratory assessment tool within the scope of this pilot study. Second, the sample size was small (n = 5), and no formal sample size justification, power analysis, or stopping criterion was applied because this work was designed as an exploratory pilot feasibility assessment. Third, the objective time metric was defined only from a standardized entry pose to task completion and did not include full outpatient workflow components such as room setup, draping, docking, or sterilization steps; future studies should quantify these workflow times to enable fair comparisons with standard care. Finally, this pilot did not include a head-to-head experimental comparison against conventional outpatient endoscopy or channeled endoscopes, so the findings should be interpreted as preliminary workload and usability signals that inform subsequent controlled and patient-based evaluations.
In conclusion, this pilot study provides phantom-based operational characterization and workload assessment of a dual-arm teleoperated system for outpatient ENT procedures. The findings support hypothesis generation and iterative system refinement by identifying key workload drivers and interface limitations. Although workload and perceived difficulty were higher for nasal tasks, all scheduled phantom trials were completed, and the results offer initial human-factor signals to inform subsequent refinement prior to patient-based evaluation. Future work will include structured head-to-head phantom comparisons followed by prospective patient-based feasibility studies assessing safety and patient-reported experience measures.
Footnotes
Ethical Considerations
Ethics approval for this study was obtained from the Institutional Review board of Hanyang University Guri Hospital (HUGH-2023-09-016-001).
Consent to Participate
Written informed consent was obtained from all participating physicians.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1I1A4A01051258). This work was supported by the research fund of Hanyang University (HY-202500000001066) and Perazah, Inc.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data available upon request at any time.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
