Sage Journals: Discover world-class research

Abstract

Firstly, I comment on the lack of support for the predictions of the lumberjack model to professionally qualified operators in high-fidelity work simulations (Jamieson & Skraaning, 2020a). I highlight the advantages that Bayesian statistics provide for qualifying the degree of evidence for the null hypotheses, issues concerning situation awareness measurement, and the alternative techniques available to study experts. Secondly, I comment on the innovative taxonomy of automation failure presented by Skraaning and Jamieson (2024), pointing out some issues with overlapping definitions and lack of cause-effect relationships. I then discuss the substantial opportunity this taxonomy presents to guide future research, such as the design of transparent automation. To conclude, I identify some other key problems regarding how we currently study human-automation teaming (e.g. presenting randomized automation failure unlinked to task context) and invite discussion from the research community on the relevance of computational modelling to this field of research.

Keywords

human-automation teaming automation failure degree of automation decision support systems

Introductory Statements

Automated systems have profoundly improved safety and productivity in domains such as healthcare, transportation, finance, aviation, and defence (National Academies of Sciences Engineering & Medicine, 2022). Given rapid progress in AI, it is increasingly critical to improve human-automation/AI design. Most, if not all, industrial contexts which support economic growth and keep society safe utilize some form of automation/AI.

The concept of degree of automation (DOA; Wickens et al., 2010) describes the level of responsibility of automation (Sheridan & Verplank, 1978) across four stages of information processing (Parasuraman et al., 2000): information acquisition, information analysis, decision recommendation, and action execution. The combination of higher levels and later processing stages constitutes higher DOA. A meta-analysis reported that as DOA increased, workload decreased and performance improved, but situation awareness (SA) and automation failure task performance degraded, supporting predictions of the lumberjack model (Onnasch et al., 2014).

In this paper, I provide opinions on two recent debates. Firstly is the debate regarding the extent to which DOA effects have been, or should be expected to, replicate from the laboratory with naïve participants that largely represent studies in the Onnasch et al. (2014) meta-analysis, to professionally qualified operators in high-fidelity work simulations (Jamieson & Skraaning, 2018; Wickens, 2018). Jamieson & Skraaning, 2020 reported what they viewed as weak evidence supporting predictions from the lumberjack model in the field. While acknowledging (as I do) the value of testing the ecological predictive validity of the lumberjack model, Wickens et al. (2020) critiqued Jamieson & Skraaning, 2020 based on low statistical power, measurement issues, and their perceived downplaying of expert subjective experience (see response by Jamieson & Skraaning, 2020b).

Another issue raised by Wickens et al. (2020) concerned the type of automation examined by Jamieson & Skraaning, 2020. In reply, Skraaning and Jamieson (2024) offered what I view as an innovative and thought-provoking initial taxonomy of automation failure. Skraaning and Jamieson (2024) highlight that much automation is reliant on an underlying system (sensors, equipment, functions, and logic) that can fail and disrupt automated system functions, but the automation performs, and is used by the human, as designed/trained (citing several airline incidents/accidents). I agree with Skraaning and Jamieson (2024) that our current definition of automation failure is narrow, and my second focus is to comment on the opportunity this initial taxonomy of automation failure presents to guide future research.

As an active researcher in this research space, I have found these debates thought-provoking and progressive and have learnt much. I thank the aforementioned authors for their innovative exchanges and have passed them on to my graduate students and post-docs.

Failure to Replicate the Lumberjack Model

Why not Use Bayesian Statistics?

It is important to not generalize broadly from initial failures to replicate the lumberjack model from what seems an evidence base constituting a handful of applications to experts completing automation-aided high-fidelity tasks (e.g. Calhoun et al., 2009; Jamieson & Skraaning, 2020; Metzger & Parasuraman, 2005).

The applied studies under the microscope (and admittedly, laboratory studies included in Onnasch et al., 2014), and much of my own research) has used frequentist statistics. Wickens et al. (2020) argue that the null effects reported by Jamieson & Skraaning, 2020 neither confirm nor disprove the lumberjack model, and Jamieson and Skraaning (2020b) replied that they see no logical reasons for dismissing null findings as uninformative, particularly because they suspect that lumberjack effect sizes outside the laboratory are trivially small.

In one sense, I see it as a judgement call what can be interpreted from the null hypothesis, and there are various methods to aid interpretation (see Cumming, 2012). However, the best way forward is to use Bayesian statistics in future research, but also retrospectively applied to the field studies with experts included in the Onnasch et al. (2014) meta-analyses and in Jamieson & Skraaning, 2020. I understand that in Human Factors, Bayesian statistics is relatively novel (as it was for myself until educated by my graduate students/post-docs). Bayes factors (BFs) can be interpreted as the strength of evidence for one hypothesis over another (thus, including the null) (Vandekerckhove et al., 2018). The key advantage is the ability to quantify evidence for the null hypothesis (as opposed to frequentist statistics failing to reject it with a p-value point estimate; Wagenmakers et al., 2018). Bayesian statistics provide a range between which researchers can be certain for or against a hypothesis (e.g. BF>3 as ‘weak evidence’, BF>5 as ‘moderate evidence’, BF>10 as ‘strong evidence’, and BF>100 as ‘very strong evidence’; Etz & Vandekerckhove, 2016), avoiding the use of a single arbitrary cut-off (p < .05) for hypothesis testing.

Measuring the Situation Awareness Underlying Automation Monitoring

Jamieson & Skraaning, 2020 reported improved SA with increasing DOA (Important Parameter Assessment Questionnaire; IPAQ). Admittedly, there is no universally accepted theory/measure of SA (Pritchett, 2015). SA broadly refers to an individual’s or team’s understanding of the relevant elements of their task(s) and how these elements might change through environmental conditions or interactions with operator control actions. There is ongoing debate regarding the extent to which SA is a state of conscious reportable knowledge (Endsley, 1995a, 2021) that can be measured by pausing the task and blanking information displays during a scenario (Situational Awareness Global Assessment Technique; Endsley, 1995b), or whether SA constitutes knowledge where to find relevant task information from interactions with displays and/or team members (situated SA/distributed SA; Chiappe et al., 2012; Stanton et al., 2015), and as such whether SA is better measured without pausing/blanking displays (e.g. Situation Present Assessment Method; Durso & Dattel, 2004). The IPAQ used by Jamieson & Skraaning, 2020 did not follow the theoretical underpinnings of either, asking participants to rate the importance or not (i.e. two forced choice decisions) of process parameters after completion of the scenarios. The SA queries were developed by SME’s (reflecting a SA requirements analysis; Endsley & Jones, 2012), but it is questionable, based on the dichotomous response scale and administration timing, whether dichotomous rating of the importance of process parameters after scenario completion reflected the real-time SA required to monitor automation. I agree with Wickens et al. (2020) then that the IPAQ unlikely measured the real-time SA of the dynamic changing value of specific process parameters during the period leading up to/during automation failure (i.e. SA of current and future predicted system parameters). Of course, it is often difficult and not well-received to pause a high-fidelity simulation or field exercise, or intrude with on-line SA queries (Loft et al., 2015; Pierce, 2012), so in that sense I understand the SA measurement choice Jamieson & Skraaning, 2020 made.

Alternatives to Assessing Experts in High Fidelity Task Contexts

Wickens et al. (2020) noted that experts in Jamieson & Skraaning, 2020 reported decreased human-automation cooperation and out-of-the loop performance issues with increased DOA. These effects were relatively strong, and industry reports often point to lumberjack model–related variables as contributors to workplace incidents/accidents (cf. Wickens et al., 2020). Although subjective reports can lack validity (Matthews et al., 2020), I agree with Wickens et al. (2020) that the expert opinions should be given weight, despite no change in objective performance. Experts learn to adapt to dynamic conditions, concurrent task demands, time pressure, and tactical constraints (Loft et al., 2009; Sheridan, 2002). Workload, SA, and performance are intricately related (Loft et al., 2023), but workload is not something imposed upon a passive operator but rather is managed through dynamic choice of work method (task prioritization, satisficing, task shedding, etc.) (Gray & Fu, 2004; Loft et al., 2007; Simon, 1956; Sperandio, 1971). Choice of work method depends on metacognitive knowledge (i.e. the monitoring and control of cognition; Efklides, 2008). It is thus a critical finding that experts in Jamieson & Skraaning, 2020 held negative perceptions of increased DOA, because that indicated that the experts held the metacognitive knowledge that an increase in task demand or an unexpected event may have exceeded their capacity (the ‘red zone’ of workload; Strickland et al., 2019; Wickens et al., 2015), and thus could be problematic for managing automation. The overarching point is that expert performance requires converging evidence from several different types of research methods (Dismukes, 2010), and that while there is no immediate solution to transferring knowledge from the laboratory to the field (Loft, 2014; Stokes, 1997), techniques such as ethnographic observation, self-report, diary studies, and accident/incident reports can be very insightful.

The Skraaning and Jamieson (2024) Taxonomy of Automation Failure

Skraaning and Jamieson (2024) made the astute observation that current definitions of automation failure are either too narrow or too broad, and their initial taxonomy defined three types of automation failure. I commend Skraaning and Jamieson (2024) for their innovation (i.e. thinking outside the box). Researchers prepared to do this are extremely valuable for making larger than incremental scientific progress.

Skraaning and Jamieson (2024) contend that Elementary Automation Failures arise from isolated failures of components or functions localized to the automation (e.g. failures in automation control logic, programming errors, malfunctioning hardware, and loss of power) that lead to the unexplained loss of automation capability. I agree with Skraaning and Jamieson (2024) that the literature is replete with examples of Elementary Automation Failures. For example, in my own work with colleagues using simulated air traffic control, aircraft conflict detection automation fails to detect some conflicts (i.e. aircraft that will violate minimum separation in the future), but from the participants’ perspective there is no apparent underlying reason the automation failed to detect a particular aircraft conflict (e.g. Gegoff et al., 2024), and the same can be said with other tasks we use such as maritime surveillance (e.g. Hutchinson et al., 2023) and submarine track management (e.g. Tatasciore et al., 2020).

Skraaning and Jamieson (2024) introduce a common, but understudied, form of automation failure referred to as Systemic Automation Failures that results from situationally triggered failures of integrated functions that support automation. A prime example Skraaning and Jamieson (2024) focus on is where sensors feed incomplete/incorrect information to automation and falsely trigger an automated function or other forms of invalid data that create confusion from which operators are unable to recover (e.g. B737 MAX, Turkish Airlines Flight 1951). Other examples of Systemic Automation Failures include parallel automated systems performing in contradictory ways or the control/decision logic of automation containing latent (hidden) problems that cause automation failure. I agree with Skraaning and Jamieson (2024) that these forms of automation failure need to be distinguished from Elementary Automation Failures reflecting unexplained (non-identifiable) failure of automation controlling a single well-defined function. Skraaning and Jamieson (2024) refer to a third category of automation failure as Human-Automation Interaction Breakdowns that reflect non-alignment between the design of automation and human capabilities (e.g. concealed operation modes, misleading decision support, and automation presenting to the human unrealistic capabilities). Interestingly, and highly related to my musings in the proceeding paragraphs, Skraaning and Jamieson (2024) also refer to Human-Automation Interaction Breakdowns that result from automation being unreliable and the underlying logic/workings of automation being unavailable to operators.

Definition Overlaps Across the Taxonomy Categories: Unavoidable?

Skraaning and Jamieson (2024) were prudent to point out that they present an initial taxonomy of automation failure and are open for discussion/further development. A current issue in my opinion is the degree of, and at times inconsistent, overlap between the definitions of the three types of automation failure (see Figure 1, p. 7; Skraaning & Jamieson, 2024). These overlaps were possibly unavoidable but are nonetheless noteworthy. To highlight an example, Elementary Automation Failures are largely referred to as resulting from failures in automation control, logic/programming, or malfunctioning hardware producing degraded/inaccurate output. What is the difference between that and the examples provided for Human-Automation Interaction Breakdowns resulting from automation providing misleading support to operators (low reliability automation)? Skraaning and Jamieson (2024) refer to Systemic Automation Failures being caused by automation working as intended but not conveying limitations or the automation having ‘hidden’ logic issues. What is the difference between that and examples provided for Human-Automation Interaction Breakdowns caused by hidden modes of operation, failure modes that are not recognizable, or automation goals and capabilities being inaccessible to operators? Any one of such factors could contribute to an incident/accident (and many cross-relate), but the initial Skraaning and Jamieson (2024) taxonomy does not speak to how operators would be expected to differentially respond to their categories of automation failure, which is ultimately the understanding required for practitioners to design work design interventions. A potential variation to the initial taxonomy could be to more cleanly delineate precipitating events (e.g. faulty sensor inputs, workload/fatigue, and environmental conditions) that cause Elementary Automation Failures and Systemic Automation Failures and then describe how the design of automation (e.g. display transparency, reliability, mode salience, DOA, and communication) and organizational level factors (i.e. training and safety culture) could potentially moderate how operators cognitively and behaviourally respond (e.g. loss of SA, attention tunnelling, mode confusion, and false expectations).

Further Issues Common to Understanding Automation Failure

Inspired by the innovation of Skraaning and Jamieson (2024), I believe there are other core additional issues in the manner in which we study human-automation teaming. Skraaning and Jamieson (2024) squarely associate Elementary Automation Failures as representing the category on which researchers have focused almost exclusively (i.e. presenting unexplained loss of automatic function). Indeed, Systemic Automation Failures have more potentially identifiable underlying causes and patterns of occurrence. Nonetheless, a major limitation of human-automation research (including my own research) is that typically participants are exposed to fixed quotas of randomized automation failure, allowing little opportunity to develop understanding of the automation they are using, limiting their capacity to predict when automation failures might occur. Really, in most studies the only learnable context for system reliability is the frequency of automation failure. This contrasts sharply with my observations in aviation and defence field settings in which automation reliability is dynamic/context-driven, allowing nuanced human understanding of automation capabilities and limitations that enable prediction of when intervention is required. Indeed, trust calibration becomes increasingly sophisticated with expertise. Trust is multifactorial and affected by a range of operator characteristics, contexts, and automation characteristics, but perceived reliability is a major driver (Hoff & Bashir, 2015). The Human-Automation Trust Expectation Model (HATEM) recently published by Carter et al., (2024) asserts that trust in automation becomes increasingly calibrated over time through human understanding of automation reliability. Trust thus, at least partly, reflects the difference (closeness) between experienced automation reliability and expected reliability (prediction error). Observed automation performance is evaluated against expectation and either increases or decreases confidence in future predictions, thereby refining understanding. We need to keep this in mind, regardless of what form of automation failure taxonomy consensus is formed.

Also common to any taxonomy of automation failure (but admittedly somewhat contradictory to my previous paragraph) is the fact (first recognized by Molloy & Parasuraman, 1996) that studies typically (including Jamieson & Skraaning, 2020) present multiple automation failures within a single testing session, with only a handful of studies examining response to a single automation failure (e.g. Bailey & Scerbo, 2007; Bowden et al., 2023, 2024; Metzger & Parasuraman, 2005). In the modern workplace, humans increasingly monitor near-perfect automated systems (Foroughi et al., 2023). While detection of a first automation failure is often poor, the detection of subsequent failures improves (Merlo et al., 2000). We need research that evaluates detection of rare automation failures.

As discussed earlier, Skraaning and Jamieson (2024) highlight the importance of automated system goals, modes, logic/rationale, and capabilities being available to operators. This is also critical for Elementary Automation Failures in addition to the other two failure categories. For example, if there are identifiable failures in automation control logic or programming errors, they should be made transparent. Further, given the focus by Skraaning and Jamieson (2024) on Systemic Automation Failures using aviation examples (Qantas Flight 72, Turkish Airlines 1951) that resulted from incorrect information from sensor input to automated cockpit systems, I was surprised that Skraaning and Jamieson (2024) did not mention the automaton transparency literature. Automation transparency is intended to aid understanding of the rationale underlying automation. A leading model is the Situation Awareness Agent-Based Transparency (Chen et al., 2014) model, outlining three levels of transparency: the automation’s goals, purpose, and intentions (Level 1), automation’s rationale underling information/advice (Level 2), and automation’s projected future outcomes if information/advice is followed and any associated uncertainty (Level 3). Reviews/meta-analyses indicate that automation transparency can improve SA and automation use, including recovery from automation failure (Bhaskara et al., 2020; Sargent et al., 2023; Van de Merwe et al., 2024). Yet at the same time, the types of automation failure in these transparency studies, such as those using uninhabited vehicle management/control tasks (e.g. Griffiths et al., 2024; Loft et al., 2023; Mercado et al., 2016; Stowers et al., 2020; Tatasciore & Loft, 2024), do not fit neatly into the Skraaning and Jamieson (2024) taxonomy. Automation failures (incorrect automated decision advice) in these studies stem from changes in military commander intent, changes in vehicle capability (e.g. payload, speed, and fuel reserve), and environmental constraints (i.e. fog, wind, and road blocks). The automation functions as intended and sensors feeding automation are not incorrect/faulty.

Conclusion

I conclude by thanking Jamieson, Skraaning, and Wickens (and their colleagues; e.g. Onnasch) for their innovative and enlightening discussion. It is obviously critical that the research we do has relevance to operational (field) settings, and we need to continue to think how to improve the manner in which we can do that. I hope I have at least made an incremental contribution in this paper with my opinions. I end by inviting the research community to comment on the relevance of a subset of my own recent work with colleagues regarding computational modelling of how humans integrate automated advice to make decisions (e.g. Strickland et al., 2021; Strickland et al., 2023; and see review by Boag et al., 2023), how humans learn to track variation in automation reliability (Strickland et al., 2024) and more generally our computational models of the human cognitive control and capacity mechanisms underlying workload management and multi-tasking in complex dynamic task environments (see review by Boag et al., 2023). Does the research community see this work as having relevance to applied environments and understanding automation failure, and if not, how could it be made more impactful from the practical use perspective?

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by an Australian Research Council Future Fellowship (FT190100812) awarded to Loft.

Shayne Loft is Professor at the University of Western Australia and currently holds an Australian Research Council Future Fellowship (Research). He received his PhD in Experimental Psychology/Human Factors in 2004 from the University of Queensland. He has 119 referred publications in Human Factors/Applied Cognitive Psychology.

References

Bailey

N. R.

Scerbo

M. W.

(2007). Automation-induced complacency for monitoring highly reliable systems: The role of task complexity, system experience, and operator trust. Theoretical Issues in Ergonomics Science, 8(4), 321–348. https://doi.org/10.1080/14639220500535301

Bhaskara

Skinner

Loft

(2020). Agent transparency: A review of current theory and evidence. IEEE Transactions on Human-Machine Systems, 50(3), 215–224. https://doi.org/10.1109/thms.2020.2965529

Boag

R. J.

Strickland

Heathcote

Neal

Palada

Loft

(2023). Evidence accumulation modelling in the wild: Understanding safety-critical decisions. Trends in Cognitive Sciences, 27(2), 175–188. https://doi.org/10.1016/j.tics.2022.11.009

Bowden

Griffiths

Strickland

Loft

(2023). Detecting a single automation failure: The impact of expected (but not experienced) automation reliability. Human Factors, 65(4), 533–545. https://doi.org/10.1177/00187208211037188

Bowden

Long

Loft

(2024). Reducing the costs of automation failure by providing voluntary automation checking tools. Human Factors. https://doi.org/10.1177/001872082311909

Calhoun

G. L.

Draper

M. K.

Ruff

H. A.

(2009). Effect of level of automation on unmanned aerial vehicle routing task. In Proceedings of the Human Factors and Ergonomics Society 53rd annual meeting, Santa Monica, CA (pp. 197–201): Human Factors and Ergonomics Society.

Carter

O. B. J.

Loft

Visser

T. A. W.

(2024). Meaningful communication but not superficial anthropomorphism facilitates human-automation trust calibration: The Human-Automation Trust Expectation Model (HATEM). Human Factors. https://doi.org/10.1177/00187208231218156

Chen

J. Y.

Procci

Boyce

Wright

Garcia

Barnes

(2014). Situation awareness-based agent transparency (ARL-TR-6905). U.S. Army Research Laboratory. https://doi.org/10.21236/ada600351

Chiappe

Strybel

T. Z.

K. P. L.

(2012). Mechanisms for the acquisition of situation awareness in situated agents. Theoretical Issues in Ergonomics Science, 13(6), 625–647. https://doi.org/10.1080/1463922x.2011.611267

10.

Cumming

(2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.

11.

Dismukes

R. K.

(2010). Remembrance of things future: Prospective memory in laboratory, workplace, and everyday settings. Reviews of Human Factors and Ergonomics, 6(1), 79–122. https://doi.org/10.1518/155723410x12849346788705

12.

Durso

F. T.

Dattel

A. R.

(2004). SPAM: The real-time assessment of SA. In Banbury

Tremblay

(Eds.), A cognitive approach to situation awareness (pp. 137–154). Ashgate.

13.

Efklides

(2008). Metacognition: Defining its facets and levels of functioning in relation to self-regulation and co-regulation. European Psychologist, 13(4), 277–287. https://doi.org/10.1027/1016-9040.13.4.277

14.

Endsley

M. R.

(1995a). Toward a theory of situation awareness in dynamic systems. Human Factors, 37(1), 32–64. https://doi.org/10.1518/001872095779049543

15.

Endsley

M. R.

(1995b). Measurement of situation awareness in dynamic systems. Human Factors, 37(1), 65–84. https://doi.org/10.1518/001872095779049499

16.

Endsley

M. R.

(2021). A systematic review and meta-analysis of direct objective measures of situation awareness: A comparison of SAGAT and SPAM. Human Factors, 63(1), 124–150. https://doi.org/10.1177/0018720819875376

17.

Endsley

M. R.

Jones

D. G.

(2012). Designing for situation awareness: An approach to human-centered design (2nd ed.). Taylor & Francis.

18.

Etz

Vandekerckhove

(2016). A Bayesian perspective on the reproducibility project: Psychology. PLoS One, 11(2), Article e0149794. https://doi.org/10.1371/journal.pone.0149794

19.

Foroughi

C. K.

Devlin

Pak

Brown

N. L.

Sibley

Coyne

J. T.

(2023). Near-perfect automation: Investigating performance, trust, and visual attention allocation. Human Factors, 65(4), 546–561. https://doi.org/10.1177/00187208211032889

20.

Gegoff

Tatasciore

Bowden

McCarley

Loft

(2024). Transparent automated advice to mitigate the impact of variation in automation reliability. Human Factors. https://doi.org/10.1177/00187208231196738

21.

Gray

W. D.

W.-T.

(2004). Soft constraints in interactive behavior: The case of ignoring perfect knowledge in-the-world for imperfect knowledge in-the-head. Cognitive Science, 28(3), 359–382. https://doi.org/10.1207/s15516709cog2803_3

22.

Griffiths

Bowden

Wee

Loft

(2024). Return-to-manual performance can be predicted before automation fails. Human Factors. https://doi.org/10.1177/00187208221147105

23.

Hoff

K. A.

Bashir

(2015). Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors, 57(3), 407–434. https://doi.org/10.1177/0018720814547570

24.

Hutchinson

Strickland

Farrell

Loft

(2023). The perception of automation reliability and acceptance of automated advice. Human Factors, 65(8), 1596–1612. https://doi.org/10.1177/00187208211062985

25.

Jamieson

G. A.

Skraaning

(2018). Levels of automation in human factors models for automation design: Why we might consider throwing the baby out with the bathwater. Journal of Cognitive Engineering and Decision Making, 12(1), 42–49. https://doi.org/10.1177/1555343417732856

26.

Jamieson

G. A.

Skraaning

(2020a). The absence of degree of automation trade-offs in complex work settings. Human Factors, 62(4), 516–529. https://doi.org/10.1177/0018720819842709

27.

Jamieson

G. A.

Skraaning

(2020b). The harder they fall? A response to Wickens et al (2019) regarding the generalizability of lumberjack predictions to complex work settings. Human Factors, 62(4), 535–539. https://doi.org/10.1177/0018720820904623

28.

Loft

(2014). Applying psychological science to examine prospective memory in simulated air traffic control. Current Directions in Psychological Science, 23(5), 326–331. https://doi.org/10.1177/0963721414545214

29.

Loft

Bolland

Humphreys

M. S.

Neal

(2009). A theory and model of conflict detection in air traffic control: Incorporating environmental constraints. Journal of Experimental Psychology: Applied, 15(2), 106–124. https://doi.org/10.1037/a0016118

30.

Loft

Bowden

V. K.

Braithwaite

Morrell

D. B.

Huf

Durso

F. T.

(2015). Situation awareness measures for simulated submarine track management. Human Factors, 57(2), 298–310. https://doi.org/10.1177/0018720814545515

31.

Loft

Sanderson

Neal

Mooij

(2007). Modeling and predicting mental workload in en route air traffic control: Critical review and broader implications. Human Factors, 49(3), 376–399. https://doi.org/10.1518/001872007X197017

32.

Loft

Tatasciore

Visser

T. A. W.

(2023). Managing workload, performance, and situation awareness in aviation systems. In Keebler

Lazzara

Wilson

Elizabeth

(Eds.), Human Factors in Aviation and Aerospace (3rd ed., pp. 171–197). Elsevier, Academic Press.

33.

Matthews

De Winter

Hancock

P. A.

(2020). What do subjective workload scales really measure? Operational and representational solutions to divergence of workload measures. Theoretical Issues in Ergonomics Science, 21(4), 369–396. https://doi.org/10.1080/1463922x.2018.1547459

34.

Mercado

J. E.

Rupp

M. A.

Chen

J. Y.

Barnes

M. J.

Barber

Procci

(2016). Intelligent agent transparency in human–agent teaming for Multi-UxV management. Human Factors, 58(3), 401–415. https://doi.org/10.1177/0018720815621206

35.

Merlo

J. L.

Wickens

C. D.

Yeh

(2000). Effect of reliability on cue effectiveness and display signaling. [Symposium]. Proceedings of the 4th Annual Army Federated Laboratory Symposium, College Park, MD, pp. 27–31.

36.

Metzger

Parasuraman

(2005). Automation in future air traffic management: Effects of decision aid reliability on controller performance and mental workload. Human Factors, 47(1), 35–49. https://doi.org/10.1518/0018720053653802

37.

Molloy

Parasuraman

(1996). Monitoring an automated system for a single failure: Vigilance and task complexity effects. Human Factors, 38(2), 311–322. https://doi.org/10.1177/001872089606380211

38.

National Academies of Sciences . (2022). Engineering, and medicine. In Human-AI teaming: State-Of-The-Art and research needs. The National Academies Press. https://doi.org/10.17226/26355

39.

Onnasch

Wickens

C. D.

Manzey

(2014). Human performance consequences of stages and levels of automation: An integrated meta-analysis. Human Factors, 56(3), 476–488. https://doi.org/10.1177/0018720813501549

40.

Parasuraman

Sheridan

T. B.

Wickens

C. D.

(2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans: A Publication of the IEEE Systems, Man, and Cybernetics Society, 30(3), 286–297. https://doi.org/10.1109/3468.844354

41.

Pierce

R. S.

(2012). The effect of SPAM administration during a dynamic simulation. Human Factors, 54(5), 838–848. https://doi.org/10.1177/0018720812439206

42.

Pritchett

(2015). Preface to the JCEDM special issue on situation awareness. Journal of Cognitive Engineering and Decision Making, 9(1), 3. https://doi.org/10.1177/1555343415572807

43.

Sargent

Walters

Wickens

(2023). Meta-analysis qualifying and quantifying the benefits of automation transparency to enhance models of human performance. Proceedings HCI-International.

44.

Sheridan

T. B.

(2002). Human and automation: System design and research issues. Wiley/Human Factors and Ergonomics Society.

45.

Sheridan

T. B.

Verplank

W. L.

(1978). Human and computer control of undersea teleoperators. Office of Naval Research.

46.

Simon

H. A.

(1956). Rational choice and the structure of the environment. Psychological Review, 63(2), 129–138. https://doi.org/10.1037/h0042769

47.

Skraaning

Jamieson

G. A.

(2024). The failure to grasp automation failure. Journal of Cognitive Engineering and Decision Making. https://doi.org/10.1177/15553434231189375

48.

Spearandio

J. C.

(1971). Variation of operator’s strategies and regulating effects on workload. Ergonomics, 14(5), 571–577. https://doi.org/10.1080/00140137108931277

49.

Stanton

N. A.

Salmon

P. M.

Walker

G. H.

(2015). Let the reader decide: A paradigm shift for situation awareness in sociotechnical systems. Journal of Cognitive Engineering and Decision Making, 9(1), 44–50. https://doi.org/10.1177/1555343414552297

50.

Stokes

D. E.

(1997). Pasteur’s quadrant: Basic science and technological innovation. Brookings Institution Press.

51.

Stowers

Kasdaglis

Rupp

M. A.

Newton

O. B.

Chen

J. Y.

Barnes

M. J.

(2020). The IMPACT of agent transparency on human performance. IEEE Transactions on Human-Machine Systems, 50(3), 245–253. https://doi.org/10.1109/THMS.2020.2978041

52.

Strickland

Boag

R. J.

Heathcote

Bowden

Loft

(2023). Automated decision aids: When are they advisors and when do they take control of human decision making? Journal of Experimental Psychology: Applied, 29(4), 849–868. https://doi.org/10.1037/xap0000463

53.

Strickland

Elliott

Wilson

M. D.

Loft

Neal

Heathcote

(2019). Prospective memory in the red zone: Cognitive control and capacity sharing in a complex, multi-stimulus task. Journal of Experimental Psychology: Applied, 25(4), 695–715. https://doi.org/10.1037/xap0000224

54.

Strickland

Farrell

Wilson

M.K.

Hutchinson

Loft

(2024). How do humans learn about the reliability of automation? Cognitive Research: Principles and Implications. WHO. https://www.researchgate.net/publication/377814555_How_do_humans_learn_about_the_reliability_of_automation

55.

Strickland

Heathcote

Bowden

Boag

R. J.

Wilson

M. K.

Khan

Loft

(2021). Inhibitory cognitive control allows automated advice to improve accuracy while minimizing misuse. Psychological Science, 32(11), 1768–1781. https://doi.org/10.1177/09567976211012676

56.

Tatasciore

Bowden

V. K.

Visser

T. A. W.

Michailovs

S. I. C.

Loft

(2020). The Benefits and costs of low and high degree of automation. Human Factors, 62(6), 874–896. https://doi.org/10.1177/0018720819867181

57.

Tatasciore

Loft

(2024). Can increased automation transparency mitigate the effects of time pressure on automation use? Applied Ergonomics, 114, 104142. https://doi.org/10.1016/j.apergo.2023.104142

58.

Vandekerckhove

Rouder

J. N.

Kruschke

J. K.

(2018). Editorial: Bayesian methods for advancing psychological science. Psychonomic Bulletin & Review, 25(1), 1–4. https://doi.org/10.3758/s13423-018-1443-8

59.

Van de Merwe

Mallam

Nazir

(2024). Agent transparency, situation awareness, mental workload, and operator performance: A systematic literature review. Human Factors, 66(1), 180–208. https://doi.org/10.1177/00187208221077804

60.

Wagenmakers

E.-J.

Marsman

Jamil

Verhagen

Love

Selker

Gronau

Q. F.

Šmíra

Epskamp

Matzke

Rouder

J. N.

Morey

R. D.

Morey

R. D.

(2018). Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications. Psychonomic bulletin & review, 25(1), 35–57. https://doi.org/10.3758/s13423-017-1343-3

61.

Wickens

(2018). Automation stages & levels, 20 years after. Journal of Cognitive Engineering and Decision Making, 12(1), 35–41. https://doi.org/10.1177/1555343417727438

62.

Wickens

C. D.

Hollands

J. G.

Banbury

Parasuraman

(2015). Mental workload, stress, and individual differences: Cognitive and neuroergonomic perspectives. In Wickens

C. D.

Hollands

J. G.

Banbury

Parasuraman

(Eds.), Engineering psychology and human performance (pp. 346–376). Psychology Press.

63.

Wickens

C. D.

Santamaria

Sebok

Sarter

N. B.

(2010). Stages and levels of automation: An integrated meta-analysis. In Proceedings of the Human Factors and Ergonomics Society - Annual Meeting (Vol. 54, pp. 389–393). Sage.

64.

Wickens

C. D.

Onnasch

Sebok

Manzey

(2020). Absence of DOA effect but no proper test of the lumberjack effect: A reply to Jamieson and skraaning (2019). Human Factors, 62(4), 530–534. https://doi.org/10.1177/0018720820901957

Accelerating Understanding of Human Response to Automation Failure

Abstract

Keywords

Introductory Statements

Failure to Replicate the Lumberjack Model

Why not Use Bayesian Statistics?

Measuring the Situation Awareness Underlying Automation Monitoring

Alternatives to Assessing Experts in High Fidelity Task Contexts

The Skraaning and Jamieson (2024) Taxonomy of Automation Failure

Definition Overlaps Across the Taxonomy Categories: Unavoidable?

Further Issues Common to Understanding Automation Failure

Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

References