Abstract
This study investigates the effect of using a systematic viewing protocol for scanning closed-circuit television (CCTV) during a simulation of remote nautical object (bridge/lock) control. For nautical object control, systematic viewing is assumed to mitigate the risk of observer errors, such as missing a road user in CCTV streams. However, previous research has reported mixed results on the benefits of systematic viewing for performance. A total of 42 professional operators were asked to operate a bridge control simulator where, unknown to the operators, critical events had to be detected. Half of the group received protocol instructions, the other half did not. The protocol group showed significantly longer dwell times and higher coverage of protocol-related CCTV areas than the no-protocol group. Critical event detection rates were identical for the two groups. While timing of the first fixation on the critical events did not significantly differ between groups, the no-protocol group responded significantly faster to the events. Although protocol application did not improve or impair detection performance, the combined results suggest that the protocol group took more time to scan the scene before acting. As nautical object control prioritizes safety over speed, increased dwell time and coverage are beneficial.
Introduction
Remote operation of objects or systems is becoming increasingly common in various domains including maritime (He et al., 2020; Kari & Steinert, 2021), aviation (Liang et al., 2020), automotive (Linkov & Vanžura, 2021), infrastructure (Liu et al., 2021), and process control (Priyanka et al., 2019). Although remote operation potentially offers many benefits (e.g., access to hard-to-reach locations, associated flexibility of operators), it also brings human factors challenges (Goodall, 2020; Kari & Steinert, 2021). A critical challenge is the potential for discrepancies between the operator’s (remote) perception of the (remote) environment, versus its actual conditions. In remote bridge control, closed-circuit television (CCTV) is used to observe road users. However, flat CCTV images are typically smaller than a local bridge view and require specific viewing approaches to mitigate the risk of observer errors (Dutch Safety Board, 2019).
The Netherlands has been a pioneer in remote nautical object (bridge/lock) control, with ergonomic workspace design on centralized object control emerging in the 1980s (Van Dam, 1981). Other countries have followed suit since (e.g. Moses, 2020; Shortridge & Abdo, 2022). Nevertheless, observer errors occasionally occur when using CCTV (Dutch Safety Board, 2016, 2019). To mitigate the risk of observer errors, Dutch national frameworks on nautical object control provide guidelines for workspace design (including CCTV requirements) and operating procedures (CROW, 2024; Dutch Safety Board, 2019). To further mitigate the risk, the National Nautical Traffic Service Training (NNVO, 2023) has incorporated a systematic viewing approach for the CCTV of the workspace into its training materials. These training materials specify that bridge/lock areas should be scanned in a set order before starting procedures (e.g., opening a bridge). However, there is currently a lack of empirical validation on the effect of systematic viewing on gaze behavior and performance for the domain of nautical object control.
Examining the effect of systematic viewing for nautical object control is important, because previous research has shown that the effectiveness of systematic viewing varies (e.g., Brams et al., 2019; Kok et al., 2016). In general, a review by Brams et al. (2019) highlights a link between systematic scanning and higher detection rate and level of expertise. For example, in radiology, systematic scanning improved lung tumor detection, with 30% of misses attributed to unsystematic scanning (Del Ciello et al., 2017). However, Kok et al. (2016) found no such effect, challenging the assumption that systematic viewing enhances scan coverage and improves (diagnostic) performance. Furthermore, professionals may be just as susceptible to errors when adopting a systematic approach as when using an intuitive (non-analytical) one (Norman & Eva, 2010).
For nautical object control, qualitative research suggests that professional operators use different gaze patterns when operating locks, but these patterns were not always considered systematic by expert evaluations (Stuut et al., 2025). Since systematic viewing principles are already integrated into operator training, but general outcomes of systematic viewing vary, we test the effect of applying a systematic viewing protocol on gaze behavior and performance for a bridge control simulation task.
This study answers the question: ‘How do systematic viewing instructions affect professional operators’ gaze behavior and performance during a simulated CCTV bridge control task?’ A bridge control simulator was used to test whether systematic viewing instructions enhance performance: noticing (more) critical events on a bridge. We expect that the use of a systematic viewing protocol results in longer dwell times and higher CCTV coverage. Furthermore, we expect better detection, faster recognition, and shorter time to first fixation on critical events.
Methodology
Design
This study used a two-condition between-subject design (protocol vs. no-protocol). Operators were asked to operate a remote bridge control simulator, simulating the Dutch “Wantijbrug” (Figure 1). All operators performed the same two baseline trials, two practice trials, and six measurement trials (Figure 2). After two baseline trials, operators in the “protocol” group received systematic viewing instructions and practiced accordingly, while the “no-protocol” group received no explicit viewing instructions. Measurement trials were four non-critical trials (no target) and two critical trials where a stationary pedestrian (target) had to be detected.

Simulator set-up of bridge “Wantijbrug.”

Flow diagram of session.
The six measurement trials were counterbalanced using a modified Latin Square in which the critical trials (1) never occurred consecutively and (2) were excluded from the first two trials. The latter ensured that, in the first two trials, participants experienced all the different simulator settings (see also section “Materials and Setting”).
Participants
The participants were 42 professional, qualified operators who worked at the same control room in the Netherlands and were unfamiliar with operating the Wantijbrug. The two groups had similar characteristics (Table 1). Random initial assignment (based on registration order) was slightly adjusted to balance groups. Participation was voluntary, with informed consent obtained via a signed form. The study was approved by the Ethical Committee of the Faculty of Social and Behavioral Sciences at Utrecht University (code 24-0120).
Protocol and No-Protocol Group Demographics.
Initial familiarity with viewing steps by NNVO (2023) that underpins this study’s viewing protocol (see also section “Materials and Setting”).
Materials and Setting
Simulator
The experiment was conducted in a private office space. The simulator (OTS-light, Figure 1) replicated the Wantijbrug. The set-up consisted of three monitors (supervisory control system, CCTV streams, enlarged CCTV stream) on an adjustable desk, emergency stop button, joystick to control one pan-tilt-zoom camera, mouse, keyboard, and stationary chair. The simulator lacked communication controls (radio, speakers), but operators received brief instructions on making announcements, as this is common in their daily work. The first author initiated trials on a monitor behind the operator. A Tobii Glasses 3.0 (firmware 2.3.4) was used to collect eye gaze data with a sampling frequency of 50 Hertz.
Trials
In each trial, operators had to complete a bridge opening and closing procedure. All baseline and practice trials had the same simulator settings (Table 2). For measurement trials A to F, half varied in weather, road traffic intensity, and vessel type. Unknown to the operator, critical trials E and F had a target pedestrian who stopped midway across the east or west side of the bridge and needed to be cleared. When a passerby pushed the target, it was covertly repositioned, and simultaneously, a vessel appeared on-screen accompanied by a call for bridge passage. The vessel was either a moving sailboat (circling while waiting) or a stationary waiting fishing cutter. The appearance of the vessel marked the starting point to initiate the bridge control procedure. To make the task more challenging, trial E had foggy conditions, limiting visibility. Trial F had normal (clear) weather conditions but during rush hours, potentially causing distractions from the target. A trial typically lasted 8 min and started with the default PTZ-camera position (enlarged on right monitor in Figure 1).
Simulator Settings for Baseline (BL), Practice (P) and Measurement Trials A to F.
Viewing Protocol
A viewing protocol was used to visually scan areas of four bridge camera streams from the middle monitor (Figure 1). The areas were based on viewing steps by NNVO (2023) and specified with a nautical advisor/trainer to tailor it to the Wantijbrug. For the viewing protocol, the steps consisted of: (1) viewing fast (motorized) traffic, (2) viewing slow traffic (i.e. cyclists and pedestrians), (3) assessing bridge/road boundaries for stationary objects (i.e. target), and (4) repeating steps 1 and 2. Each step corresponded to specific CCTV areas (Figure 3). Step 1 encompassed the blue areas (1.1 to 1.4), step 2 the green areas (2.1 to 2.4), and step 3 the yellow areas (3.1 to 3.12). All steps had to be executed in sequential numerical order, twice per trial: once before closing the barriers and once before opening the bridge.

CCTV coverage areas according to viewing protocol.
Procedure
During a 30-minute baseline, participants were familiarized with the simulator (layout and procedure) and completed two baseline trials. In the first baseline trial, simulator and procedural documentation was available; in the second, participants performed independently.
The protocol group then received instructions on the viewing protocol, followed by two practice trials (20–30 min total). In the first practice trial, the protocol was available; in the second, participants performed independently. Viewing behavior was monitored for adherence by the first author. The no-protocol group also had the same trials but received no specific viewing instructions. After the practice, there was a 15-minute break. Subsequently, operators who practiced the protocol were asked to confirm their understanding and apply it again during a 50 to 60 min measurement with six trials. The no-protocol group was merely asked to perform the six trials.
After completing all trials, participants were asked about their experiences, any questions, and were instructed to keep session details confidential to avoid revealing information about the protocol and scenarios to their colleagues. The total session took about 2 hr.
Analysis
Gaze Behavior in Non-critical Trials A to D
Dwell time and CCTV coverage were computed for non-critical trials, as critical events potentially distract instructed viewing behavior. The Tobii I-VT Attention gaze filter was used to compute fixations for all gaze data (velocity threshold: 100°/sec) in Tobii Pro Lab (version 24.21.435 [×64]). Assisted mapping, and manual mapping if necessary, were applied for times of interest (TOIs) from the moment the operator initiated the procedure to stop land traffic until the start of the bridge opening. A snapshot of the normal (clear) and foggy weather setting was used to create areas of interest (AOIs). For coverage, AOIs were similar to the viewing protocol (Figure 3, normal weather snapshot). For dwell time, four different AOIs were created taking the AOIs in Figure 3 together for each stream. For baseline trial 2 and trials A to D, total dwell time and total scan coverage (Formula 1) were then computed. After computing the average for trial A to D, a one-way analysis of covariance (ANCOVA) with baseline score as covariate compared mean dwell time and coverage between groups using IBM SPSS Statistics 29.0.2.0 (20).
Performance and Gaze Behavior in Critical Trials E and F
Detection rate, reaction time, and time to first fixation were computed for critical trials involving a target to be detected. A correct detection occurred when the target was called off the bridge. Chi-squared tests compared detection rates between groups. Reaction time was measured from the moment the vessel appeared to the operator’s response, defined as verbally responding to the pedestrian, clicking the announcement tab in the supervisory control system, or making an announcement to leave the bridge. Time to first fixation on the target AOI was computed from the moment the pedestrian stopped walking. Independent-samples t-tests compared reaction time and time to first fixation between groups.
Results
Gaze Behavior in Non-critical Trials A to D
Dwell Time
Raincloud plots depict dwell times for both groups, with individual means shown as dots and distributional characteristics represented by density shapes, box plots, and whiskers (Figure 4). After correcting for outliers in the protocol group (participant P20 and P23, standardized residuals > ±2SD) and controlling for pre-intervention (baseline) dwell time, an ANCOVA showed a significant difference in post-intervention log-transformed dwell time between the groups, F(1, 37) = 69.4, p < .001, ηp2 = 0.65. Based on adjusted mean ± standard error, log-transformed dwell time was significantly longer in relevant CCTV streams in the protocol group (M = 1.41, SE = 0.035; non-transformed: M = 28.4 s, SE = 2.15 s) compared to the no-protocol group (M = 1.01, SE = 0.033; non-transformed: M = 11.1 s, SE = 2.04 s). Transformation ensured homogeneity of variance (Levene’s test: p = .082). Results were unchanged when outliers were included and on the non-transformed variable, but then violated assumptions (i.e., specify). Additionally, adjusting to the Tobii I-VT fixation filter (velocity threshold: 30°/sec) did not affect outcomes.

Raincloud plots of log-transformed dwell time.
CCTV Coverage
After controlling for pre-intervention (baseline) coverage, an ANCOVA showed a significant difference in post-intervention coverage between the groups, F(1, 39) = 30.6, p < .001, ηp2 = 0.44. Based on adjusted mean ± standard error, coverage was significantly higher on relevant (protocol) areas of the CCTV streams in the protocol group (M = 68.3%, SE = 2.08%) compared to the no-protocol group (M = 51.9%, SE = 2.08%). This difference is further highlighted in the coverage distributions (Figure 5), with a clear negative skew (protocol) and a positive skew (no-protocol). Adjusting to the Tobii I-VT fixation filter did not affect outcomes.

Raincloud plots of CCTV coverage.
Performance and Gaze Behavior in Critical Trials E and F
Detection
Detection rate of targets was identical between the two groups; each group had a total of 24 correct observations (hits) and 18 observer errors (misses). A chi-squared test of independence showed that the frequencies of errors (0, 1, or 2) did not differ between groups, χ2(2) = 0.53, p = .77, Cramer’s V = 0.11. When assessing the two critical trials (E and F) separately, a chi-squared test of independence again showed no significant differences in hits/misses between groups (trial E: χ2(1) = 0.38, p = .54, Cramer’s V = 0.095; trial F: χ2(1) = 0.40, p = .53, Cramer’s V = 0.098).
Reaction Time
For 13 out of 42 operators, there was no reaction time as they missed the targets in both critical trials. The remaining data therefore came from 15 operators in the protocol group and 14 operators in the no-protocol group. The distributions of the reaction time data and individual observations are shown in Figure 6. An independent-samples t-test showed that, based on square root-transformed reaction times, the target was responded to significantly faster in the no-protocol group (M = 6.86, SD = 1.57; non-transformed: M = 49.3 s, SD = 21.4 s) than in the protocol group (M = 9.06, SD = 1.53; non-transformed: M = 84.2 s, SD = 29.5 s), t(27) = 3.83, p < .001, d = 1.42. Transformation ensured normality for the no-protocol group (Shapiro-Wilk’s test: p = .073).

Raincloud plots of sqrt-transformed reaction time.
Time to First Fixation
Time to first fixation was analyzed for the same cases as in the reaction time analysis (i.e., target observed). The original and log-transformed data consistently had two outliers in the no-protocol group (P2 and P12). Inclusion or exclusion and transformation gave different results for parametric tests, and therefore a non-parametric Mann-Whitney U test was applied to all original data. Figure 7 shows the time to first fixation distributions across groups. Time to first fixation on the target was not significantly different between the protocol group (Mdn = 34.2 s) and no-protocol group (Mdn = 20.8 s), U = 72.0, z = −1.44, p = .15.

Raincloud plots of time to first fixation.
Discussion
This study examined whether a systematic viewing approach by professional operators mitigates the risk of CCTV observer errors during a simulated remote bridge control task. Our results suggested no higher detection rates of targets (i.e., hard-to-detect pedestrians) when using a protocol. Although the protocol group showed longer dwell times and higher coverage of protocol-related CCTV areas, operators responded faster to the targets when they did not use a protocol.
Our study extends previous research by investigating whether a systematic viewing approach can improve professionals’ performance, as most prior studies focused on existing levels of systematic viewing among different expertise levels (see review by Brams et al., 2019). Our finding of increased coverage but no performance difference mirrors Kok et al. (2016), who reported higher scan coverage with systematic or full coverage instructions among medical students, but no performance difference between the systematic and non-systematic groups. Furthermore, we found that professionals made a similar number of observer errors regardless of protocol use. This aligns with findings by Norman and Eva (2010), who suggest that experts are equally prone to errors when engaging in a systematic approach.
The current study has some limitations. While the first author monitored adherence during practice, it remains to be analyzed to what extent the protocol group applied the viewing steps, and whether their behavior was more systematic than in the no-protocol group. On the one hand, the protocol group showed higher coverage, a prerequisite for systematic protocol viewing, which may indicate a more structured approach. On the other hand, coverage scores did not indicate full adherence, which could reflect limitations in eye tracking sensitivity or limited practice time. For the latter, it remains unclear how much practice is needed for such gaze strategies to be applied intuitively. Future analyses are needed to determine the extent and effectiveness of protocol training, and whether it supports lasting behavioral transfer.
When evaluating performance, the number of observer errors underscores the complexity of careful CCTV scanning. Importantly, our experimental setting did not fully reflect operational conditions (e.g., unfamiliar bridge, absence of real-world consequences), which may explain error rate and similar detection scores between groups. Moreover, faster responses in the no-protocol group suggest efficiency but not necessarily better performance, as remote bridge control prioritizes safety over speed. These slower responses in the protocol group may be attributed to a temporary increase in cognitive load or a sustained rise in mental effort. Nevertheless, the increased CCTV scanning following systematic viewing training, along with stable detection performance, supports the potential value of a viewing protocol, as this increase may enhance the likelihood of spotting risks in nautical object control.
Footnotes
Acknowledgements
Many thanks to the operators who kindly agreed to take part in this study.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by Rijkswaterstaat, the executive organization of the Dutch Ministry of Infrastructure and Water Management. The views and opinions expressed in this paper do not reflect the official stance of Rijkswaterstaat or the Dutch government.
