Sage Journals: Discover world-class research

Abstract

Objective

To determine how lower degree of automation (DOA) reliability impacts human response to a single higher-DOA failure in simulated air traffic control conflict detection.

Background

Higher-DOA systems apply higher levels of automation to later stages of human information processing. Higher-DOA typically results in better routine performance, and lower-DOA with better automation failure response. If both are provided and lower-DOA is reliable, it could support higher DOA failure detection.

Method

Participants (N = 192) received a combination of lower-DOA and/or higher-DOA. Lower-DOA highlighted aircraft conflicts and near-misses, leaving participants to manually resolve conflicts. Higher-DOA resolved conflicts. Automation failed once. Participants were provided one of four types of automation: lower-DOA, where lower-DOA failed (L_F); higher-DOA, where higher-DOA failed (H_F); both lower- and higher-DOA, where only higher-DOA failed (LH_F); or both lower- and higher-DOA, where both failed (L_FH_F).

Results

When only the higher-DOA component of combined lower- and higher-DOA failed (LH_F), participants detected the automation failure 23.6s faster and more accurately (miss rate = −.08) compared to higher-DOA only (H_F). However, more participants missed the automation failure when lower-DOA failed (L_F = +.42; L_FH_F = +.15), compared to the H_F condition.

Conclusions

Reliable lower-DOA can support higher DOA failure detection when both are presented. However, poorer automation failure detection with lower-DOA failure suggests participants over-relied on aircraft highlighting to direct attention to potential conflicts.

Applications

Providing both lower- and higher-DOA together could be beneficial when higher-DOA fails but lower-DOA remains reliable, but conversely, detrimental if lower-DOA also fails.

Keywords

human-automation teaming air traffic control automation failure detection degree of automation

Humans increasingly work with automation designed to support their performance, with “automation” referring to technology that partly, or fully, completes tasks for the human. When automation is reliable, a human aided by automation would be expected to outperform the same unaided human because technology does not share inherent human limitations (e.g., boredom, fatigue, and information processing capacity limits). As automated support increases, the role of the human shifts from active task performance to supervisory oversight of automation (Bainbridge, 1983). The amount of automated support can be referred to as the degree of automation (DOA; Wickens et al., 2010), a nomenclature which ranks the level of autonomy and responsibility of automation versus the human (Sheridan et al., 1978) across four stages of information processing (information acquisition, information analysis, decision recommendation, and action execution; Parasuraman et al., 2000). Systems applying higher levels of automation to later stages of human information processing constitute higher-DOA. For example, lower-DOA may organize/integrate incoming information, but leave decisions and actions to the human, whereas higher-DOA recommends decisions and/or performs associated actions for the human. While the concept of DOA may simplify the complexities of human–machine interactions in modern work settings (Jamieson & Skraaning, 2018; Johnson et al., 2018), it has proven useful for exploring the tradeoffs between broad categories of automation design (Onnasch et al., 2014; Wickens, 2018).

Higher-DOA work systems are increasingly prevalent, with examples including air traffic control (ATC) systems which self-separate aircraft (Cai et al., 2024), and driverless cars which control a vehicle without intervention (Marcano et al., 2020). However, in both examples humans must monitor automation and return to manual control if it fails. A meta-analysis by Onnasch et al. (2014) indicated that higher-DOA leads to better performance when functioning reliably, but poorer performance when it fails, relative to lower-DOA. When higher-DOA fails, the human supervisor may be more disengaged or “out of the loop” and therefore slower/less accurate to respond (Endsley & Kiris, 1995; Manzey et al., 2012). Supervision can be further impaired by high trust of historically reliable automation, which reduces monitoring behavior (Bailey & Scerbo, 2007).

A persistent human factors challenge is to design systems which maximize the benefits of automation, while ensuring human operators can recover from automation failure. One potential solution is to lower the DOA provided. However, while this can improve failure recovery (e.g., Kaber et al., 2000; Li et al., 2014; Shen & Neyens, 2014), it may have limited utility in some work settings. For example, the benefits of higher-DOA (e.g., decision automation) in ATC include increased airspace predictability and capacity, and reduced controller workload—outcomes not necessarily achievable by providing lower-DOA alone (e.g., information integration). Another potential solution is adaptive automation, which can periodically disable higher-DOA support and return decision control to operators (Scerbo, 1996, 2018). Adaptive automation triggers DOA changes based on task performance, operator cognitive state variation (Griffiths et al., 2024), or operator preference (Chen et al., 2017). However, the issue remains that lowering DOA may not be appropriate in some safety-critical work settings—even temporarily. Therefore, automation design solutions that retain higher-DOA, while facilitating automation failure recovery, could be useful.

Using Lower Degree Automation to Support Higher Degree Failure Detection

Automated systems in the modern workplace often contain multiple layers of interacting DOA functions in their design (Jamieson & Skraaning, 2018). In ATC, an example would be higher-DOA which resolves aircraft conflicts for controllers (i.e., avoiding projected loss of aircraft separation; Cai et al., 2024; FAA, 2018) being reliant on underlying lower-DOA which determines all converging aircraft pairs at the same altitude. While some prior studies have provided higher-DOA without an underlying lower-DOA (e.g., Bowden et al., 2023; Calhoun et al., 2009; Griffiths et al., 2024; Lorenz et al., 2001), others provided participants with higher-DOA that also presents an underlying lower-DOA component (e.g., Manzey et al., 2012; Rovira et al., 2007, 2017; Sarter & Schroeder, 2001; Shen & Neyens, 2014; Tatasciore et al., 2020). For example, in an undersea contact classification task, Tatasciore et al. provided participants both information integration (lower-DOA) and action recommendations (higher-DOA). The lower-DOA displayed when and for how long contacts were in areas of interest, and the higher-DOA presented this information as well as recommendations to participants regarding contact classification. As is typical of such studies, when automation failed, both the lower- and higher-DOA failed to function. The current study extends the literature by investigating the effect of partial failures when both lower- and higher-DOA are provided. Specifically, we investigated how lower-DOA reliability impacted human response to a single higher-DOA failure in simulated air traffic control. To our knowledge, this has not been empirically tested in the human–automation interaction literature.

It is possible that presenting a reliable lower-DOA component alongside higher-DOA could support higher-DOA failure detection. Reliable lower-DOA could provide a diagnostic tool to support higher-DOA failure detection and may effectively increase the transparency of the higher-DOA by providing information which aids the operator in understanding the rationale underlying higher-DOA decisions/actions (Bhaskara et al., 2020; Gegoff et al., 2024; Van de Merwe et al., 2022), facilitating operator situation awareness (Endsley, 2017). In Tatasciore et al. (2020) for example, reliable lower-DOA could have displayed how long contacts were in areas of interest to assist participants in detecting that higher-DOA was currently advising incorrect contact classifications. However, as discussed further below, to the extent individuals rely on lower-DOA, the failure of lower-DOA may be detrimental to higher-DOA failure detection.

The Current Study

In simulated air traffic control, participants were responsible for supervising a sector of airspace and, amongst other tasks, determining whether aircraft would “conflict.” Conflicts occurred when aircraft violated the minimum lateral and vertical separation standard. We manipulated the provision of lower- and higher-DOA conflict detection support to examine impacts on participants’ ability to respond to a single automation failure in simulated ATC. Participants were provided with lower-DOA and/or higher-DOA. Lower-DOA highlighted both conflicts (aircraft pairs at the same altitude and on intersecting flightpaths) and “near-misses” (pairs at the same altitude that came close, but did not violate separation). With lower-DOA, participants were responsible for deciding if there was a conflict, given that not all lower-DOA highlighted pairs would conflict. Participants then manually intervened to resolve conflicts. The higher-DOA automatically resolved conflicts and ignored near-misses. With higher-DOA (regardless of whether accompanied by the presentation of lower-DOA), participants were notified when an automated action occurred, but otherwise not required to manually intervene.

Automation failed only once as this is more representative of automation in modern work contexts, such as ATC, in which failures are relatively rare (Bowden et al., 2023; Foroughi et al., 2023; Wickens et al., 2009). This is a critical design feature (see Loft, 2024) because repeated failures likely artificially increase human skepticism of automation (i.e., “first failure effect”; Merlo et al., 2000). While the tradeoff is reduced statistical power to detect significant effects, we argue that practical effect sizes are similarly relevant to examine in the case of single automation failures.

The nature of the automation failure was condition specific. When lower-DOA failed, one conflict pair presented late in the scenario was not highlighted upon entering the sector. When higher-DOA failed, this conflict was not resolved upon the aircraft pair entering the sector. Thus, both lower- and higher-DOA failures involve an automation error of omission. We recorded participants’ speed and accuracy to manually intervene to the single failure of automation, and their subjective experience by measuring perceived workload, trust in automation, and task disengagement.

Participants were assigned to one of four automation conditions: (1) lower-DOA, where the lower-DOA failed (L_F); (2) higher-DOA, where the higher-DOA failed (H_F); (3) combined lower-DOA and higher-DOA, where only the higher-DOA failed (LH_F); or (4) combined lower-DOA and higher-DOA, where both the lower-DOA and higher-DOA failed (L_FH_F). Experimental conditions and related hypotheses are summarized in Table 1. First, we compared the failure detection performance of participants who were provided with lower-DOA (L_F) to those provided with higher-DOA (H_F). We expected to replicate findings that higher-DOA leads to slower and/or less accurate automation failure detection, and lower perceived workload, than lower-DOA (Onnasch et al., 2014). Given that higher-DOA may induce more passive monitoring than lower-DOA, we expected participants to report higher task disengagement in the higher-DOA condition (Cummings et al., 2013; Pattyn et al., 2008).

Table 1.

Between-Participant Condition Manipulations Including Degree of Automation (DOA) Provided, and Whether Lower- And/or Higher-DOA Component Failed (_F).

Condition	DOA provided	Failure	Failure detection Performance Prediction
H_F	Higher	Higher	.
L_F	Lower	Lower	L_F better than H_F
LH_F	Lower + Higher	Higher	LH_F better than H_F
L_FH_F	Lower + Higher	Lower + Higher	?

Lower-DOA highlighted conflicts and near-misses, while higher-DOA automatically resolved conflicts and ignored near-misses. Predicted automation failure detection performance is compared to the higher-DOA only condition (H_F).

For the two combined DOA conditions (LH_F and L_FH_F), when both DOAs were functioning reliably, the higher-DOA resolved conflicts automatically and the lower-DOA highlighted near-misses as potential conflicts. Thus, when only the higher-DOA component failed (LH_F), the conflict aircraft pair was highlighted but not resolved by automation. When both lower- and higher-DOA failed (L_FH_F), the conflict was neither highlighted nor resolved. In the LH_F condition, the reliable lower-DOA should provide diagnostic information to draw participant attention to the failure of higher-DOA. We therefore predicted that the LH_F condition would have superior failure detection compared to the H_F condition. Participants should also report increased trust (measured postscenario) in the lower-DOA compared to higher-DOA in the LH_F condition given only the higher-DOA failed. It was less clear what effect both DOA components simultaneously failing (L_FH_F) would have on failure detection. Participants may be poorer at detecting the automation failure with L_FH_F compared to H_F if they were less likely to check aircraft pairs not highlighted by lower-DOA when searching for potential conflicts missed by the higher-DOA. Alternatively, the failure of lower-DOA highlighting for an aircraft pair observed by participants to be at the same altitude and on an intersecting path may be sufficiently conspicuous to draw attention and prompt participants to monitor the pair more carefully. In this case, participants in the L_FH_F condition may have superior failure detection than those in the H_F condition. Given this, we do not predict a direction for L_FH_F in Table 1.

Method

Participants

One-hundred and ninety-two undergraduate students (82 male, 109 female, 1 nonbinary) aged 17–51 (M = 19.9 years) participated for course credit and were randomly allocated to one of four conditions (n = 48). Participants also received a performance-based bonus between AU$6–10. Two participant’s data were replaced (one withdrew, and one did not follow task instructions). This research complied with the tenets of the Declaration of Helsinki and was approved by the Institutional Review Board at The University of Western Australia. Informed consent was obtained from each participant.

Air Traffic Control Simulation

The en-route ATC simulation was displayed on two 22-inch monitors. The radar display (Figure 1) was based on ATC-lab^Advanced (Fothergill et al., 2009), while the flight progress and event log display was based on Masalonis et al. (1997). Participants were responsible for aircraft in the inner sector (light grey), while the outer sector (dark grey) displayed incoming/outgoing aircraft. Aircraft were represented by icons with attached data blocks containing call sign (e.g. D10), model (e.g., A320), current altitude (e.g., 400, denotes 40,000 ft), cleared altitude, and speed (e.g., 50, denotes 500 knots). Aircraft traveled along fixed one-way flightpaths, before leaving the inner sector. Participants were responsible for a median of eight aircraft at one time. New aircraft entered the display every 40 s on average. Aircraft traveled at a constant cruising altitude unless directed to ascend by participants or automation to prevent a conflict. Aircraft speed was fixed. Aircraft positions were updated every second.

Figure 1.

The ATC radar display with lower-DOA highlighting (purple) for aircraft D29 and D10 (i.e., this pair are at the same altitude, 40,000 ft, and on an intersecting trajectory). This pair was the automation failure event. This example reflects the LH_F condition, where higher-DOA has failed to resolve the conflict while lower-DOA has correctly highlighted the pair as a potential conflict. Aircraft D86 is flashing blue to indicate it requires acceptance. A scenario timer and an automation status indicator were provided in the top right corner on the display.

The secondary display contained flight progress strips and the event log (Figure 2). Strips contained the call sign, model, altitude, and flightpath for accepted aircraft. Waypoints remaining along the flightpath were grey, while those already crossed were white. The event log contained a scrolling list of all previous actions taken by the participant and/or automation (i.e. acceptances, handoffs, and conflict resolution).

Figure 2.

Secondary display containing aircraft flight progress strips (left) and event log (right).

Aircraft flashed blue when they approached the inner sector boundary to indicate they required acceptance. Participants pressed the “A” key and clicked on the aircraft to accept it. Aircraft stopped flashing and turned green once accepted. Aircraft flashed orange when leaving the inner sector. Participants handed-off departing aircraft by pressing the “H” key and clicking on the aircraft. Aircraft stopped flashing and turned black once handed-off. Participants had up to 20 s to accept or hand-off aircraft.

Conflicts occurred when aircraft violated the minimum separation standard (1,000 ft vertical, five nautical mile lateral). To help participants judge lateral separation, a yellow scale was provided in the bottom-left corner of the radar display depicting five nautical miles. Participants were instructed to prioritize preventing conflicts and to intervene to prevent them as quickly and accurately as possible. To do this, participants clicked one aircraft’s data block, which opened a dialogue box prompting them to select the conflicting aircraft. It did not matter which aircraft in the pair was selected first. If there was a conflict, one aircraft climbed 1,000 ft, and the action was added to the event log. If there was no conflict, an audio message notified participants of the false alarm. If a conflict was not prevented before loss of separation, the conflicting pair turned yellow, and an audio message notified participants of the miss. Conflicts could only be prevented after both aircraft involved had been accepted. Each scenario contained 10 conflicts and six near-misses. Near-misses involved aircraft at the same altitude and on converging fight paths that came close to violating lateral separation (i.e., at the same altitude but ∼10 s away from being classified as a conflict laterally).

Lower-DOA

Lower-DOA highlighted potential conflict aircraft pairs traveling at the same altitude and on an intersecting flightpath. As a result, both conflicts and near-misses were highlighted. When highlighted, the aircraft pair changed color and their data blocks were bolded (Figure 1). Highlight color alternated for successive potential conflict pairs (pair 1 = red, pair 2 = purple, etc.) given that up to two aircraft pairs could be highlighted simultaneously. Highlighting commenced once the second aircraft in a potential conflict pair was accepted. Highlighting ceased once the pair was no longer a potential conflict (i.e., altitude changed to prevent conflict, or one aircraft passed the common intersection). Participants were instructed that highlighting did not guarantee a pair would conflict since near-misses would also be highlighted.

Higher-DOA

Higher-DOA also resolved conflicts when the second aircraft in a conflicting pair was accepted. At this time, one aircraft ascended 1,000 ft to avoid the conflict and the event log reported the automated action.

Participants were instructed that both lower-DOA and higher-DOA were highly reliable, but not perfect, and that in the unlikely event that automation either performed an incorrect action or failed to perform an action, it was their responsibility to intervene to correct the failure.

Conditions

Participants completed one manual and one automated scenario (order counter-balanced). Acceptance and handoff tasks were performed manually in all scenarios and conditions. In the manual scenario, participants detected and resolved conflicts without automation. In the automated scenario, participants were supported by condition-specific automation that either: highlighted potential conflicts (lower-DOA), automatically resolved conflicts (higher-DOA), or both. The conflict and near-miss events presented in the manual and automated scenarios were unique, except the ninth conflict event (occurred 26 min 20 s into the 30-min scenario), which was presented in both scenarios (with call signs changed). This conflict is referred to as the failure event. Presenting this conflict twice allowed us to examine participant responses to the same event across manual and automated scenarios. When automation failed, it was absent for this event only, and otherwise 100% reliable. Conflicts not resolved by automation (i.e., manual scenarios or when only lower-DOA was provided) are referred to below as “other conflicts.” Participants were assigned to one of the four automation conditions: H_F, L_F, LH_F, and L_FH_F (Table 1). It was not possible to include a L_FH condition because a successful higher-DOA conflict resolution action left no potential conflicting aircraft pair for the lower-DOA to fail to highlight.

Questionnaires

Trust

Merrit’s (2011) 6-item “Trust in Automation” questionnaire was completed after automation scenarios. Trust was rated separately for lower- and higher-DOA. Participants responded to items such as “I trust the Highlighting” on a scale from 1 (strongly disagree) to 5 (strongly agree).

Workload

The National Aeronautics and Space Administration Task Load Index (NASA-TLX; Hart & Staveland, 1988) was completed after each scenario. The weighted composite of six workload subscales (mental demand, physical demand, temporal demand, effort, performance, and frustration) was used to calculate a total workload score between 0 and 100, with higher scores indicating higher workload.

Task Disengagement

A 5-item task disengagement questionnaire was completed after each scenario. Items were from the inattention and disengagement subscales of the Multidimensional State Boredom Scale (items: 3, 10, 16, 20, and 23; Fahlman et al., 2013). Participants responded to items such as “I felt bored” on a scale from 1 (strongly disagree) to 7 (strongly agree). A 5-item average score was calculated.

Procedure

Participants first completed a 25-min audio-visual training presentation explaining the tasks, followed by a 30-min manual practice scenario during which they could ask the experimenter questions. Performance feedback on acceptance, handoff, and conflict prevention tasks was provided after practice and experimental scenarios. After training, participants completed two 30-min scenarios: one manual and one automated. Two unique ATC scenarios were created, but with an identical ninth conflict event (aircraft call signs were changed). The assignment of the two scenarios to manual or automated, as well as presentation order (automated first or second), was counterbalanced. The automated scenario was preceded by additional condition-specific instructions. The experiment lasted approximately 2.5 h.

Results

Jamovi (version 2.3.21) was used for data analysis. Performance on all measures was equivalent in the manual scenario (one-way ANOVAs, smallest p = .106). However, for completeness, full 4 (Condition: H_F, L_F, LH_F, L_FH_F) x 2 (Scenario: manual, automated) ANOVAs are first reported for each measure (except for single failure event miss rate which was a binary outcome variable). Planned comparisons that paralleled our hypotheses (Table 1) are then reported. Table 2 presents descriptive statistics for ATC task performance. Significance was set at p < .05. However, as noted earlier, with a single automation failure design it is important to consider practical effect sizes, not only statistical significance. Effect sizes for F-tests were estimated using partial eta squared (small = .01, medium = .06, and large = .14), and Cohen’s d (small = 0.20, medium = 0.50, and large = 0.80) was provided for t-tests (Cohen, 1992).

Table 2.

Means and Standard Deviations (in Brackets) for Manual and Automated Scenario Task Performance for the Four Automation Conditions (H_F, L_F, LH_F, and L_FH_F).

	Other conflicts miss rate	Other conflicts RT (s)	Near-miss false alarm rate	Failure event miss rate	Failure event RT (s)
Manual scenario
H_F	.05 (.10)	93.0 (29.7)	.35 (.25)	.02 (.14); N = 1	90.9 (39.0)
L_F	.05 (.07)	86.8 (29.3)	.30 (.25)	.04 (.20); N = 2	82.9 (36.4)
LH_F	.03 (.07)	92.6 (27.6)	.25 (.25)	.02 (.14); N = 1	99.3 (31.0)
L_FH_F	.03 (.06)	89.3 (33.0)	.24 (.21)	.00 (.00); N = 0	95.1 (37.9)
Automated scenario
H_F	.	.	.17 (.17)	.10 (.31); N = 5	120.2 (32.8)
L_F	.01 (.04)	76.8 (33.3)	.27 (.24)	.52 (.50); N = 25	110.7 (34.7)
LH_F	.	.	.26 (.19)	.02 (.14); N = 1	96.6 (36.2)
L_FH_F	.	.	.21 (.21)	.25 (.44); N = 12	119.7 (29.1)

“Other conflicts” refer to non-failure conflict events (only available in conditions when higher-DOA was absent). Since the manual scenario contained no automation, the “Failure event” refers to the identical manual conflict. For “Failure event miss rate,” N refers to the number of participants who either missed the single failure automation or failed to resolve the identical manual conflict. “Failure event response time” was only available for participants who did not miss the failure event.

Failure Event Performance

The automation failure was “missed” if participants did not intervene before the aircraft pair violated the separation standard. Table 2 shows that failure event miss rate in the manual scenario was near floor (≤.04). In the automated scenario, failure event miss rate was compared between conditions using chi-squared tests. Unexpectedly, more participants missed the single failure event in the lower-DOA condition (L_F) compared to the higher-DOA condition (H_F), M_diff = +.42, χ² = 19.4, p < .001. Nominally fewer participants missed the failure event in the combined condition where only higher-DOA failed (LH_F) compared to the H_F condition, M_diff = −.08, χ² = 2.84, p = .092. Nominally more participants missed the failure event in the combined condition where both DOAs failed (L_FH_F) compared to the H_F condition, M_diff = +.15, χ² = 3.50, p = .061.

Due to the single failure design, failure event RT was unavailable if participants missed the conflict in either the manual or automated scenario. A 4 × 2 ANOVA showed a main effect of scenario, F (1,144) = 37.74, p < .001, η_p² = .21, no effect of condition, F = 1.65, p = .181, and an interaction between scenario and condition, F (3,144) = 6.55, p < .001, η_p² = .12. In the automated scenario, there was no difference in failure event RT between the L_F and H_F conditions, t(64) = 1.10, p = .276. Participants were faster to intervene to the failure in the LH_F compared to the H_F condition, M_diff = −23.6s, t (88) = 3.24, p = .002, d = .68. There was no difference between the L_FH_F and H_F conditions, t < 1.

Near-Miss False Alarms

False alarms occurred when participants attempted to intervene to near-miss aircraft. A 4 × 2 ANOVA showed a main effect of scenario, F (1,188) = 9.10, p = .003, η_p² = .05, no effect of condition, F < 1, and an interaction between scenario and condition, F (3,188) = 5.00, p = .002, η_p² = .07. In the automated scenario, more false alarms were made in the L_F compared to the H_F condition, M_diff = +.10, t (94) = 2.25, p = .027, d = .46. Participants also made more false alarms in the LH_F compared to the H_F condition, M_diff = +.09, t(94) = 2.54, p = .013, d = .52. There was no difference between the L_FH_F and H_F conditions, t(94) = 1.06, p = .290.

All Other Conflicts

In addition to the failure event, participants in the L_F condition were required to manually intervene to prevent the nine “other conflicts” presented in the automated scenario. To examine the extent to which providing lower-DOA benefitted conflict detection when functioning reliably, we compared L_F condition performance between manual and automated scenarios. Lower-DOA highlighting decreased the “other conflict” miss rate, M_diff = −.04, t (47) = 3.37, p = .002, d = .49, and RT, M_diff = −10.0s, t (47) = 2.04, p = .047, d = .29, compared to manual detection without highlighting. There was no difference in near-miss false alarms, t < 1. Overall, this indicates that when reliable, lower-DOA improved participants’ ability to prevent conflicts.

Questionnaires

Trust

Trust in lower-DOA (highlighting) and higher-DOA (conflict resolution) were only measured following the relevant automated scenario and are summarized in Table 3. Overall, one-way ANOVAs comparing eligible conditions showed no effect of condition on trust for either lower-DOA, F < 1, or higher-DOA, F < 1.

Table 3.

Means and Standard Deviations (in Brackets) for Postscenario Questionnaire Responses in Manual and Automated Scenarios, for the Four Automation Conditions (H_F, L_F, LH_F, and L_FH_F).

	Trust in lower-DOA	Trust in higher-DOA	Workload	Task disengagement
Manual scenario
H_F	.	.	43.5 (16.3)	3.40 (1.59)
L_F	.	.	46.9 (15.8)	3.27 (1.40)
LH_F	.	.	44.8 (18.2)	3.37 (1.33)
L_FH_F	.	.	44.6 (15.4)	3.46 (1.24)
Automated scenario
H_F	.	3.24 (.95)	35.3 (14.6)	3.93 (1.50)
L_F	2.80 (0.78)	.	40.6 (15.5)	3.81 (1.52)
LH_F	2.97 (1.01)	3.03 (.92)	34.1 (20.2)	4.33 (1.52)
L_FH_F	2.76 (0.98)	3.04 (.71)	34.1 (15.3)	4.24 (1.52)

Questionnaires regarding trust were only administered after the relevant automated scenario.

For the two combined DOA conditions, we can also compare trust in lower-DOA and higher-DOA components within-participants. For the LH_F condition, trust in lower-DOA did not differ from trust in higher-DOA, despite only higher-DOA failing, t < 1. For the L_FH_F condition, trust in lower-DOA was lower than trust in higher-DOA, despite both failing, M_diff = −0.28, t(47) = 2.14, p = .038, d = .31.

Workload

A 4 × 2 ANOVA showed a main effect of scenario, F(1,185) = 74.6, p < .001, η_p² = .29, replicating the well-established finding of lower perceived workload with automation. There was no effect of condition, F < 1, and no interaction, F < 1. In the automated scenario, there were no significant differences between any of the L_F, LH_F, or L_FH_F conditions and the H_F condition, smallest p = .092.

Task Disengagement

A 4 × 2 ANOVA showed a main effect of scenario, F(1,185) = 48.4, p < .001, η_p² = .21, indicating increased task disengagement with automation. There was no effect of condition, F < 1, and no interaction, F < 1. In the automated scenario, there were no significant differences between any of the L_F, LH_F, L_FH_F conditions, and the H_F condition, smallest p = .197.

Discussion

This study investigated how lower-DOA reliability influenced higher-DOA failure detection in simulated ATC, under conditions in which both lower- and higher-DOA were provided. Lower-DOA highlighted both conflicts and near-misses, and when provided without higher-DOA, left participants to decide whether there was a conflict and to manually intervene when necessary. When higher-DOA was provided (either with or without lower-DOA), participants were not required to intervene because automation resolved conflicts and ignored near-misses. In conditions where both lower- and higher-DOA were provided, when higher-DOA failed we varied whether lower-DOA remained reliable (LH_F) or also failed (L_FH_F). Participant speed and accuracy when manually intervening to the automation failure event in lower-DOA (L_F), and combined DOA conditions (LH_F, L_FH_F), was compared to the higher-DOA condition (H_F). Subjective workload, trust, and task disengagement were measured.

Lower-DOA vs Higher-DOA: A Benefit of Increased Degree of Automation?

We expected to replicate prior findings of higher-DOA leading to slower and/or less accurate automation failure detection than lower-DOA (Onnasch et al., 2014). This is known as the “lumberjack” effect, where increased DOA improves routine performance, but impairs performance under automation failure conditions (Sebok & Wickens, 2017). In contrast, we found the opposite—with participants substantially more likely to miss the automation failure with lower-DOA (L_F) compared to higher-DOA (H_F) (52% vs. 10% miss rate; no difference in RT).

To interpret this result, we re-considered what support lower-DOA was providing. Lower-DOA was designed to function similarly to “medium-term conflict detection” tools which draw controller attention to potential conflicts (FAA, 2023; Ruiz et al., 2013). The use of color is an effective way of both drawing attention and grouping relevant information in ATC. For example, Remington et al. (2000) showed that color-coding aircraft altitudes improved controller conflict detection. In line with this, we found that when it was reliable, lower-DOA (L_F) increased conflict detection accuracy and decreased RT compared to manual detection (i.e., no highlighting). This suggests that lower-DOA benefitted conflict detection under nonfailure conditions (as in Remington et al.), and thus functioned as intended.

Regarding subjective experience, we found automation provision (regardless of DOA) lowered perceived workload and increased task disengagement compared to manual conditions. However, when higher-DOA was compared to lower-DOA, we did not find the predicted reduction in perceived workload (Onnasch et al., 2014) or increase in task disengagement. This was unexpected, but it does indicate that variation in workload and/or task disengagement is unlikely to account for the poorer failure detection in the lower-DOA (L_F) compared to higher-DOA condition (H_F). Another possibility is that participants trusted lower-DOA more than higher-DOA and therefore supervised it less closely. This too was unsupported, as participants instead reported less trust in the lower-DOA relative to higher-DOA when provided with both DOAs simultaneously (LH_F and L_FH_F conditions).

We then re-considered how lower-DOA might influence visual scanning—a critical component underlying conflict detection in ATC (e.g., Gronlund et al., 1998; McClung & Kang, 2016; Remington et al., 2000). Lower-DOA assists operators by organizing incoming information but leaves decision-making/action-implementation to the human (Parasuraman et al., 2000; Wickens et al., 2010). Information processing assistance functions by directing attention towards relevant aspects of the task. However, the literature shows that visual attention is limited (Posner et al., 1980) and typically cannot be divided without cost (Pashler, 1994; Wickens et al. 2022). It is plausible then that poor failure detection with lower-DOA was due to attention capture by other concurrently highlighted aircraft pairs (Remington et al., 1992; Theeuwes, 1992). However, this explanation is unlikely, as the current study had little-to-no highlighting of other aircraft during the 173s failure event.

Our preferred explanation for poorer automation failure detection with lower-DOA (L_F) compared to higher-DOA (H_F) is that the provision of lower-DOA fundamentally changed the strategy that participants used to monitor automation, leading participants to rely on aircraft highlighting to direct their attention to assess potential conflicts, and thereby having negative implications when lower-DOA failed. As discussed below, outcomes from the combined DOA conditions support this conclusion.

Can Lower Degree Automation Improve Higher Degree Failure Detection?

We predicted that a combination of lower- and higher-DOA, where only the higher-DOA component failed (LH_F), would lead to superior automation failure detection than a higher-DOA only condition (H_F). This was supported, with participants in the LH_F condition detecting the single automation failure event 23.6s faster and nominally more accurately (2% vs. 10% miss rate), presumably because the reliable lower-DOA highlighted the aircraft pair missed by higher-DOA and thus drew participant attention to it. This finding supports the premise that reliable lower-DOA can support the detection of higher-DOA failures by making the logic/rationale underlying higher-DOA more transparent, providing diagnostic information regarding automation predictability and performance (e.g., Bhaskara et al., 2020; Gegoff et al., 2024; Van de Merwe et al., 2022).

While participants were provided with instructions that the lower- and higher-DOA components would function (or fail) independently, the two were clearly linked. With higher-DOA only (H_F), participants did not know if their observation of automation not intervening to the failure event was due to (1) there being no conflict (i.e., it was near-miss) or (2) a failure of automation. With the addition of reliable lower-DOA (LH_F), near-misses were highlighted, thus providing participants with additional information regarding which aircraft pairs the higher-DOA had likely “considered” but not acted upon. In line with this, participants in the LH_F and L_F conditions made more conflict detection false alarms than the H_F condition (+.10 and +.09, respectively), likely because they were routinely assessing the aircraft highlighted by the lower-DOA.

The finding that participants in the L_FH_F condition were nominally less accurate than those in the H_F condition (25% vs. 10% failure event miss rate) further supports our interpretation that the provision of lower-DOA changed the strategy that participants used to monitor automation, encouraging them to rely on aircraft highlighting to direct their attention to potential conflicts. This reliance on lower-DOA had positive implications when lower-DOA remained reliable (LH_F condition), but negative implications when it failed (L_FH_F and L_F conditions), when compared to the H_F condition.

Future Directions and Conclusions

The current study presented an automation error of omission, such that when automation failed participants received no advice. However, automated systems can also provide incorrect information (errors of commission). Wickens et al. (2015) showed that automation commission errors can be more difficult for humans to detect than automation omission errors. Future research could therefore extend the current findings to situations where automation makes false alarms, or otherwise provides incorrect advice. For example, lower-DOA could highlight an aircraft pair that is in fact safe at different cruising altitudes, and/or higher-DOA could incorrectly intervene to near-miss aircraft.

Overall, this study provides evidence that, at least in the context of simulated ATC conflict detection with nonexpert participants, combined lower- and higher-DOA presentation could be beneficial when higher-DOA fails but lower-DOA remains reliable (i.e., automation failure detection was superior for the LH_F condition compared to the H_F condition). However, the outcomes also clearly indicate both DOA components failing simultaneously (L_FH_F) led to poorer failure detection compared to higher-DOA alone (H_F). Moreover, the data indicate that participants relied on the lower-DOA support to highlight potentially conflicting aircraft, to the point where automation failure detection was substantially poorer with lower-DOA only (L_F) compared to higher-DOA only (H_F). Our findings support recent suggestions that the effects of increased DOA in modern work settings may not neatly fall on a continuum (Jamieson & Skraaning, 2018; Loft, 2024; Skraaning & Jamieson, 2023). Instead, system designers should carefully consider, as a function of specific task context, how different combinations of DOA presentation are likely to impact human monitoring strategies and recovery from automation failure.

Key Points

• This study examined how lower degree of automation (DOA) reliability impacted human response to a single automation failure in simulated air traffic control. When lower-DOA remains reliable, it could support higher-DOA failure detection.

• Participants received a combination of lower-DOA and/or higher-DOA support on a simulated air traffic control task where automation failed to resolve a single aircraft conflict.

• Reliable lower-DOA supported higher-DOA failure detection. Poorer automation failure detection with lower-DOA component failure suggests participants relied on aircraft highlighting to direct attention to potential conflicts.

• Providing both lower- and higher-DOA could be beneficial if higher-DOA fails but lower-DOA remains reliable, but conversely, detrimental if lower-DOA also fails.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Australian Research Council Discovery Grant DP160100575 awarded to Loft from the Australian Research Council, and Future Fellowship FT190100812 awarded to Loft from the Australian Research Council. Correspondence concerning this article should be addressed to Vanessa K. Bowden, School of Psychological Science, The University of Western Australia, Crawley, Western Australia 6009, Australia (email: vanessa.bowden@uwa.edu.au).

ORCID iDs

Vanessa K. Bowden

Isabella Gegoff

Shayne Loft

Author Biographies

Vanessa K. Bowden is a senior lecturer at The University of Western Australia. She received her PhD in psychology from The University of Western Australia in 2012.

Isabella Gegoff is currently completing her PhD and masters in industrial/organizational psychology at The University of Western Australia.

Philippe J. Kilpatrick completed his masters in industrial/organizational psychology at The University of Western Australia in 2021. He currently works as a senior human performance specialist for AirServices Australia.

Shayne Loft is a professor at The University of Western Australia. He received his PhD in psychology in 2004 from the University of Queensland.

References

Bailey

N. R.

Scerbo

M. W.

(2007). Automation-induced complacency for monitoring highly reliable systems: the role of task complexity, system experience, and operator trust. Theoretical Issues in Ergonomics Science, 8(4), 321–348.

Bainbridge

(1983). Ironies of automation. In Analysis, design and evaluation of man–machine systems (pp. 129–135). Pergamon.

Bhaskara

Skinner

Loft

(2020). Agent transparency: A review of current theory and evidence. IEEE Transactions on Human-Machine Systems, 50(3), 215–224. https://doi.org/10.1109/thms.2020.2965529

Bowden

V. K.

Griffiths

Strickland

Loft

(2023). Detecting a single automation failure: The impact of expected (but not experienced) automation reliability. Human Factors, 65(4), 533–545. https://doi.org/10.1177/00187208211037188

Cai

Liu

Shi

Meng

(2024). Consensus-based merging and spacing method for aircraft self-separation at intersections. Journal of Guidance, Control, and Dynamics, 47(6), 1097–1108. https://doi.org/10.2514/1.g007681

Calhoun

G. L.

Draper

M. H.

Ruff

H. A.

(2009). Effect of level of automation on unmanned aerial vehicle routing task. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 53(4). 197–201. https://doi.org/10.1177/154193120905300408

Chen

S. I.

Visser

T. A.

Huf

Loft

(2017). Optimizing the balance between task automation and human manual control in simulated submarine track management. Journal of Experimental Psychology: Applied, 23(3), 240–262. https://doi.org/10.1037/xap0000126

Cohen

(1992). Statistical power analysis. Current Directions in Psychological Science, 1(1), 98–101. https://doi.org/10.1111/1467-8721.ep10768783

Cummings

M. L.

Mastracchio

Thornburg

K. M.

Mkrtchyan

(2013). Boredom and distraction in multiple unmanned vehicle supervisory control. Interacting with Computers, 25(1), 34–47. https://doi.org/10.1093/iwc/iws011

10.

Endsley

M. R.

(2017). From here to autonomy: Lessons learned from human–automation research. Human Factors, 59(1), 5–27. https://doi.org/10.1177/0018720816681350

11.

Endsley

M. R.

Kiris

E. O.

(1995). The out-of-the-loop performance problem and level of control in automation. Human factors, 37(2), 381–394.

12.

Fahlman

S. A.

Mercer-Lynn

K. B.

Flora

D. B.

Eastwood

J. D.

(2013). Development and validation of the multidimensional state boredom scale. Assessment, 20(1), 68–85.

13.

Federal Aviation Administration . (2018). NextGen implementation plan. U.S. Department of Transportation. Retrieved from. https://www.faa.gov/nextgen/media/NextGen_Implementation_Plan2018-19.pdf

14.

Federal Aviation Administration . (2023). Aeronautical information manual. U.S. Department of Transportation. Retrieved from. https://www.faa.gov/media/70471

15.

Foroughi

C. K.

Devlin

Pak

Brown

N. L.

Sibley

Coyne

J. T.

(2023). Near-perfect automation: Investigating performance, trust, and visual attention allocation. Human Factors, 65(4), 546–561. https://doi.org/10.1177/00187208211032889

16.

Fothergill

Loft

Neal

(2009). ATC-labAdvanced: An air traffic control simulator with realism and control. Behavior Research Methods, 41(1), 118–127.

17.

Gegoff

Tatasciore

Bowden

McCarley

Loft

(2024). Transparent automated advice to mitigate the impact of variation in automation reliability. Human Factors, 66, 2008–2024.

18.

Griffiths

Bowden

Wee

Strickland

Loft

(2024). Operator selection for human-automation teaming: The role of manual task skill in predicting automation failure intervention. Applied Ergonomics, 118(1), Article 104288. https://doi.org/10.1016/j.apergo.2024.104288

19.

Gronlund

S. D.

Ohrt

D. D.

Dougherty

M. R.

Perry

J. L.

Manning

C. A.

(1998). Role of memory in air traffic control. Journal of Experimental Psychology: Applied, 4(3), 263–280. https://doi.org/10.1037//1076-898x.4.3.263

20.

Hart

S. G.

Staveland

L. E.

(1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology (Vol. 52, pp. 139-183). North-Holland.

21.

Jamieson

G. A.

Skraaning

Jr, G.

(2018). Levels of automation in human factors models for automation design: Why we might consider throwing the baby out with the bathwater. Journal of Cognitive Engineering and Decision Making, 12(1), 42–49. https://doi.org/10.1177/1555343417732856

22.

Johnson

Bradshaw

J. M.

Feltovich

P. J.

(2018). Tomorrow’s human–machine design tools: From levels of automation to interdependencies. Journal of Cognitive Engineering and Decision Making, 12(1), 77–82. https://doi.org/10.1177/1555343417736462

23.

Kaber

D. B.

Onal

Endsley

M. R.

(2000). Design of automation for telerobots and the effect on performance, operator situation awareness, and subjective workload. Human Factors and Ergonomics in Manufacturing & Service Industries, 10(4), 409–430. https://doi.org/10.1002/1520-6564(200023)10:43.0.CO;2-V

24.

Wickens

C. D.

Sarter

Sebok

(2014). Stages and levels of automation in support of space teleoperations. Human Factors, 56(6), 1050–1061. https://doi.org/10.1177/0018720814522830

25.

Loft

(2024). Accelerating understanding of human response to automation failure. Journal of Cognitive Engineering and Decision Making, 18(197), Article 15553434241234108. https://doi.org/10.1177/15553434241234108

26.

Lorenz

Di Nocera

Röttger

Parasuraman

(2001). The effects of level of automation on the out-of-the-loop unfamiliarity in a complex dynamic fault-management task during simulated spaceflight operations. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 45(2), 44–48. https://doi.org/10.1177/154193120104500209

27.

Manzey

Reichenbach

Onnasch

(2012). Human performance consequences of automated decision aids: The impact of degree of automation and system experience. Journal of Cognitive Engineering and Decision Making, 6(1), 57–87. https://doi.org/10.1177/1555343411433844

28.

Marcano

Díaz

Pérez

Irigoyen

(2020). A review of shared control for automated vehicles: Theory and applications. IEEE Transactions on Human-Machine Systems, 50(6), 475–491.

29.

Masalonis

A. J.

M. A.

Klinge

J. C.

Galster

S. M.

Duley

J. A.

Hancock

P. A.

… Parasuraman

(1997, October). Air traffic control workstation mock-up for free flight experimentation: Lab development and capabilities. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 41, No. 2, pp. 1379–1379). Los Angeles, CA: SAGE Publications.

30.

McClung

S. N.

Kang

(2016). Characterization of visual scanning patterns in air traffic control. Computational Intelligence and Neuroscience, 2016(1), Article 8343842. https://doi.org/10.1155/2016/8343842

31.

Merlo

J. L.

Wickens

C. D.

Yeh

(2000). Effect of reliability on cue effectiveness and display signaling. In Proceedings of the 4th annual army federated laboratory symposium (pp. 27–31). DTIC.

32.

Onnasch

Wickens

C. D.

Manzey

(2014). Human performance consequences of stages and levels of automation: An integrated meta-analysis. Human factors, 56(3), 476–488.

33.

Parasuraman

Sheridan

T. B.

Wickens

C. D.

(2000). A model for types and levels of human interaction with automation. IEEE Transactions on systems, man, and cybernetics-Part A: Systems and Humans, 30(3), 286–297.

34.

Pashler

(1994). Dual-task interference in simple tasks: Data and theory. Psychological Bulletin, 116(2), 220–244. https://doi.org/10.1037/0033-2909.116.2.220

35.

Pattyn

Neyt

Henderickx

Soetens

(2008). Psychophysiological investigation of vigilance decrement: boredom or cognitive fatigue? Physiology & behavior, 93(1–2), 369–378.

36.

Posner

M. I.

Snyder

C. R.

Davidson

B. J.

(1980). Attention and the detection of signals. Journal of Experimental Psychology, 109(2), 160–174. https://doi.org/10.1037/0096-3445.109.2.160

37.

Remington

R. W.

Johnston

J. C.

Ruthruff

Gold

Romera

(2000). Visual search in complex displays: Factors affecting conflict detection by air traffic controllers. Human Factors, 42(3), 349–366. https://doi.org/10.1518/001872000779698105

38.

Remington

R. W.

Johnston

J. C.

Yantis

(1992). Involuntary attentional capture by abrupt onsets. Perception & Psychophysics, 51(3), 279–290. https://doi.org/10.3758/bf03212254

39.

Rovira

McGarry

Parasuraman

(2007). Effects of imperfect automation on decision making in a simulated command and control task. Human Factors, 49(1), 76–87. https://doi.org/10.1518/001872007779598082

40.

Rovira

Pak

McLaughlin

(2017). Effects of individual differences in working memory on performance and trust with various degrees of automation. Theoretical Issues in Ergonomics Science, 18(6), 573–591. https://doi.org/10.1080/1463922x.2016.1252806

41.

Ruiz

Piera

M. A.

Del Pozo

(2013). A medium term conflict detection and resolution system for terminal maneuvering area based on spatial data structures and 4D trajectories. Transportation Research Part C: Emerging Technologies, 26(1), 396–417. https://doi.org/10.1016/j.trc.2012.10.005

42.

Sarter

N. B.

Schroeder

(2001). Supporting decision making and action selection under time pressure and uncertainty: The case of in-flight icing. Human Factors, 43(4), 573–583. https://doi.org/10.1518/001872001775870403

43.

Scerbo

M. W.

(1996). Theoretical perspectives on adaptive automation. In Parasuraman

Mouloua

(Eds.), Automation and human performance (pp. 37–63). Lawrence Erlbaum.

44.

Scerbo

M. W.

(2018). Theoretical perspectives on adaptive automation. In Automation and human performance (pp. 37–63). CRC Press.

45.

Sebok

Wickens

C. D.

(2017). Implementing lumberjacks and black swans into model-based tools to support human–automation interaction. Human Factors, 59(2), 189–203. https://doi.org/10.1177/0018720816665201

46.

Shen

Neyens

D. M.

(2014). Assessing drivers’ performance when automated driver support systems fail with different levels of automation. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 58(1), 2068–2072. https://doi.org/10.1177/1541931214581435

47.

Sheridan

T. B.

Verplank

W. L.

Brooks

T. L.

(1978, November). Human/computer control of undersea teleoperators. In NASA. Ames Res. Center The 14th Ann. Conf. on Manual Control.

48.

Skraaning

Jr, G.

Jamieson

G. A.

(2023). The failure to grasp automation failure. Journal of Cognitive Engineering and Decision Making, 18(9), Article 15553434231189375. https://doi.org/10.1177/15553434231189375

49.

Tatasciore

Bowden

V. K.

Visser

T. A.

Michailovs

S. I.

Loft

(2020). The benefits and costs of low and high degree of automation. Human Factors, 62(6), 874–896. https://doi.org/10.1177/0018720819867181

50.

Theeuwes

(1992). Perceptual selectivity for color and form. Perception & Psychophysics, 51(6), 599–606. https://doi.org/10.3758/bf03211656

51.

van de Merwe

Mallam

Nazir

(2022). Agent transparency, situation awareness, mental workload, and operator performance: A systematic literature review. Human Factors, 66(1), Article 00187208221077804. https://doi.org/10.1177/00187208221077804

52.

Wickens

(2018). Automation stages & levels, 20 years after. Journal of Cognitive Engineering and Decision Making, 12(1), 35–41. https://doi.org/10.1177/1555343417727438

53.

Wickens

C. D.

Clegg

B. A.

Vieane

A. Z.

Sebok

A. L.

(2015). Complacency and automation bias in the use of imperfect automation. Human Factors, 57(5), 728–739. https://doi.org/10.1177/0018720815581940

54.

Wickens

C. D.

Hooey

B. L.

Gore

B. F.

Sebok

Koenicke

C. S.

(2009). Identifying black swans in NextGen: Predicting human performance in off-nominal conditions. Human Factors, 51(5), 638–651. https://doi.org/10.1177/0018720809349709

55.

Wickens

C. D.

Santamaria

Sebok

Sarter

N. B.

(2010). Stages and levels of automation: An integrated meta-analysis. In Proceedings of the human factors and ergonomics society 54th annual meeting (pp. 389–393): Human Factors and Ergonomics Society.

56.

Wickens

C. D.

McCarley

J. M.

Gutzwiller

R. S.

(2022). Applied attention theory (2nd ed.). CRC Press, Taylor & Francis.

The Impact of Lower Degree Automation Reliability on Higher Degree Automation Failure Detection in Simulated Air Traffic Control

Abstract

Objective

Background

Method

Results

Conclusions

Applications

Keywords

Using Lower Degree Automation to Support Higher Degree Failure Detection

The Current Study

Method

Participants

Air Traffic Control Simulation

Lower-DOA

Higher-DOA

Conditions

Questionnaires

Trust

Workload

Task Disengagement

Procedure

Results

Failure Event Performance

Near-Miss False Alarms

All Other Conflicts

Questionnaires

Trust

Workload

Task Disengagement

Discussion

Lower-DOA vs Higher-DOA: A Benefit of Increased Degree of Automation?

Can Lower Degree Automation Improve Higher Degree Failure Detection?

Future Directions and Conclusions

Key Points

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iDs

Author Biographies

References