Abstract
How to characterize automation failures? From the point of view of complex, multi-agent operations, I argue that their definition and modeling is the most useful when it accounts for the important drivers of operational performance and safety. This calls us beyond a focus on one-human, one-system performing one task, to a team simultaneously executing many activities in which things are always failing. Much of this activity is defined by the dynamics of the work environment, which can be modeled and predicted. Further, having a human legally responsible for the outcome of the automation’s actions significantly colors the dynamic.
Reflecting on how to define and model automation failures, I’m reminded of the adage that “all models are wrong, some are useful.” My personal interests focus on safety in complex, multi-agent operations, such as commercial air transport. Thus, characterization of automation failure is the most useful to me when it accounts for the following drivers of operational performance and safety.
First, things go wrong all the time. Systems break; teammates mess up or miss something; unusual conditions pop up that require unique responses or that stretch the capabilities of normal practice and procedures, human capabilities, the operating range of hardware, and the algorithms driving the software. The human operators are not surprised that failures happen: perversely, dealing with the abnormal is part of their normal activities.
Second, the human operators’ job is to handle it all. I say “handle” to evoke the evolving dynamic of controlling the operation in its entirety, fending-off, mitigating, and resolving concerns as they go. Notably, if automation fails—for any of the reasons noted by Skraaning and Jamieson in their paper—the human operators’ immediate task is not to disambiguate the source, such as figuring out whether the failure is in the software logic of the control system or in the physical actuator that it commands, and not to try to fix it—their immediate task is to regain and maintain a stable state, often by dis-engaging the automation and executing its actions themselves. Once a new stable state is reached, the operator then can assess the best course of action from there-on-out if-and-when there is time available, where a common choice is leaving the automation dis-engaged until maintenance personnel can safely investigate. Thus, different causes of automation failure are often irrelevant to the operator.
Third, the operation requires multiple tasks to be executed by multiple people using multiple tools (from the mundane to the highly automated). The lab standard for human-automation interaction typically focuses on one person and one automated system, focusing on one primary task (perhaps with secondary tasks as distractors or modulators of workload), or a small perturbation thereof. But in my experience the cockpit—as with many other domains—involves the complex inter-play of many agents performing many tasks. Some tasks are independent of each other, but others may need to be concurrent or sequential. This gives at least some of the humans—particularly the team-leads such as the captain of the aircraft or ship—the flexibility to (and the additional task of) deciding what tasks will be done by whom and using which tools—including automation.
In a broader sense, this may involve choices of strategies about how and how well to do each task, such as whether to strategically devote attention to find an enduring resolution to an issue (e.g., create an optimal route to the destination), or to more tactically implement an evolving pattern of interim actions and corrections (e.g., point generally towards the destination and keep correcting the flight path using rules of thumb) (Here, let me recognize Hollnagel’s work on Cognitive Control and the Efficiency-Thoroughness Trade-Off as my inspiration [Hollnagel, 1993, 2017]). Different strategies may use different tools to perform the same set of tasks, some purposefully relying on automation for reliable performance, some relying on automation in the belief (hope) that it will work but monitoring closely, and some deciding that automation, given its performance, reliability, difficulty in implementing and/or difficulty in monitoring and intervening, is just not worth the effort to invoke (Here, let me recognize Kirlik’s insights on when automation can, and should, remain unused [Kirlik, 1993]).
Finally, one or more human operators are ultimately legally responsible for the outcome, which is not a passive role: it demands active supervision. Many of the constructs we use in human-automation interaction research focus exclusively on whether the human or the machine is granted “authority to execute” specific actions. However, unless the activity is so completely automated that the human is not considered “responsible for the outcome,” the interaction is closer to what we commonly term as “mixed initiative” but further colored by the aura of responsibility: regardless of what the automation has the theoretical capability to execute, the responsible human is required to monitor it in sufficient detail and with sufficient frequency that s/he can intervene as fast as the automation (for whatever reason) can effectively fail to get the job done. For example, while commercial air transport is often portrayed as “highly automated,” I can propose characterizing only two significant systems as being “fully automated”: first, with fly-by-wire/fly-by-light the pilot’s control inputs on the yoke/side-stick are automatically interpreted by the flight control computer as intentions for the motion of the aircraft, and translated into commands for the actual control surfaces; second, the Full Authority Digital Engine Controller (FADEC) reduces the task of starting and managing engines needing a third flight crew member to one where the remaining crew of two can toggle a simple switch starts each engine. The ability of these systems to function independently is particularly scrutinized in certification and also by the pilots when they are starting up: because they cannot intervene in these systems’ functions in flight, their only intervention is to decide to not take-off if they fear something is amiss during their pre-flight. All other systems, including the autopilot in an aircraft and the so-called Autopilot in a Tesla, require the responsible human to constantly monitor and, when in doubt, intervene.
This pragmatic, operational perspective establishes my personal desiderata in the questions our human-automation interaction community should be addressing more fully. The first is we need to address the full range of implications of a human operator being legally responsible for the outcomes of a machine’s actions. Even as automation is represented as off-loading humans—as taking on tiring tasks and reducing human error—the courts remind us that the human is ultimately responsible for the outcome and hence will be blamed for not taking control fast enough if the machine fails to meet the demands of any situation. This is a strange dichotomy in modern conversations: some claim the motivation for automation is elimination of human error, yet when the automation fails it is in the human’s error for not dis-engaging it. The fragile truce in this authority-responsibility double-bind (thank you Dave Woods! [1985]) is often framed as a problem with human complacency, that is, that everything would work just fine if the human operator would just pay attention and monitor the automation, to the point that insufficient vigilance smacks of moral failing or violation of the protestant work ethic.
I hope that we—the Cognitive Engineering community—will push back on this paradigm for two reasons. The first is pragmatic—we must push to the fore the questions of whether it is theoretically achievable for the human, given the real-time information given to them and their knowledge of what the automation might do with that information, to effectively identify when the automation is failing to meet the demands of the situation—and then, whether it is theoretically achievable for the human, given a reasonable allowance for their own reaction time and the dynamic response of the system, to take control and recover the situation? The second reason is ethical—in situations where there are pragmatic concerns with the assumption of human responsibility for automation’s output, we, as the human-automation-interaction community, should be articulating the safety concerns and difficulties any human would have with capturing failures, and the impossible situations we sometimes place them in.
Second, getting into the details, we need to understand the monitoring and intervention tasks sufficiently to assess the taskload they create and the system understanding and real-time information they require. Despite the importance of monitoring, there is a marked lack of insight into how it occurs. Billman et al. (2020) recently focused on pilot monitoring of autoflight. In the process, they found that, to create their application-specific developments, they needed to construct fundamental representations (including sense-making and strategic regulation), because the literature does not already provide general models of monitoring.
In my own work attempting to computationally model human-autonomy teams performing complex operations, my students and I found a surprising scarcity in the literature on how to describe monitoring. For example, does the act of “monitoring” the conduct of another agent’s activities involve only examining the outcomes of those activities, which implies sufficient expertise with the activities to have some intuition about whether the outcomes are correct and situationally appropriate? Or, does it also consider the inputs into those activities, which implies a parallel process, sufficiently intensive to assess whether the inputs make sense and can be trusted as good sources of information, whether the desired outcomes are being properly calculated, and whether the desired outcomes are then being sufficiently commanded to, and created by, the appropriate actuators? This distinction dramatically shapes the needs of the humans-who-are-monitoring, and should be explicitly considered in our methods for identifying training and information requirements, and for designing effective operating environments for them.
Similarly, what do we know about interventions? Here, my own work is reverting to first principles to characterize the ecology: Analysis of the dynamics of the work can identify how often, and when, interventions may be required to address many classes of anomalies within the team. This can then be extended to identify when monitoring would be required to be able to trigger an intervention, accounting for the reaction time to get engaged and also the time within the dynamics needed to stabilize or correct the problem.
Third, we need to understand the impact of distributing monitoring duties and information across a network of interacting agents. Once the activity extends beyond one human-one automated system, monitoring becomes an important aspect of team coordination and synchronization of activity. Monitoring may be implemented as an uncoordinated parallel activity (e.g., a single pilot monitors the autoflight system), but it is also often monitored in more coordinated ways. Mirroring the explicit two-person best practices for flight crew in administering checklists, for example, it may involve a progression of call-and-response activities. In robotic operations, it may involve the human supervisor being expected to first command the robot’s activity, the human being free for other tasks during the robot’s activity, and then the human needing to confirm the outcome of the robot’s activity before the robot can be commanded to its next task; in flight operations, a rough parallel exists with air traffic clearances in which an aircraft will not be cleared to take-off or land until the air traffic controller clears them based on some monitoring of their situation. Each of these types of coordination may be purposefully invoked in the name of safety as a form of redundancy and cross-checking. However, the decision to do so should also recognize the degree that they intertwine the behaviors of the agents on the team, making some wait on others to act, and others to wait on the monitors to approve them. IJtsma (2019), for example, found that the inclusion of robots into extra-vehicular activity in space operations could require more time from the human astronauts (due to the interactions needed to monitor the outcomes of the robots’ actions) than to just execute the actions themselves.
Similarly, monitoring is impacted by information distribution across the team. While there is value in a human supervisor using a separate, redundant information source to monitor others (including automation), Fickett’s research (in preparation) is characterizing how monitoring for some aspects of safety (e.g., that a small electric aircraft has sufficient charge to perform its next mission) has fundamental limits in its rate of correct detection, and a propensity for false alarms, depending on when the monitoring is completed relative to others activities (e.g., during safety checks before flight while it is still charging vs. delaying take-off for a final check) and whether it is using the same information and knowledge as the agent performing the action (e.g., assessing the battery level alone out of context, without having the same understanding of the amount of energy that will be required given the condition and temperature of the battery, payload weight, winds, etc.). Thus, even in an operation where team members in theory share access to the same database, monitoring may itself have theoretical limits.
Fourth, we need to better understand and support expert strategy selection. I believe I should design-in sufficient flexibility to “let the user finish the design,” particularly since I don’t presume that any of my designs will anticipate all the extensions and variations that will be needed throughout their lifetimes. However, as a research community, we tend to focus on how operators will execute a fixed (often fairly small and focused) set of tasks, in which automation likewise has a fairly fixed role. At a more macro level, we don’t have good representations of the strategies by which human operators choose which tasks to complete, articulate how well and how often they will return to each task, adjust their methods of coordinating with and monitoring each other, and decide which tools they will apply—including automation.
The decision to apply automation as part of these strategies involves more nuance than simply choosing to engage or disengage. An operator may choose to fully invoke automation to control a process but, leery of its capability in context, may also choose to monitor it closely; conversely, an operator may choose to manually perform a task, but so closely rely on automation’s indicators of what it would do that the machine determines their behavior. I believe that experts are well-adapted to their ecology, and thus their strategies are rational—a property that affords us mechanisms for their analysis and some measure of prediction. Similarly, operational practices such as Crew Resource Management (CRM) are building appropriate automation use into their guidance; to some degree, they may provide us with a description of work-as-done to build upon.
Fifth, we need to provide specific, actionable guidance to the developers of technology on what we need intelligent machines, automation, autonomous agents, and AI to be able to communicate to tip the balance in the human operator’s decision of whether it’s worth using a machine given the supervision it will require. Having become a supervisor (of people), I have learned that some are easy to supervise—they will cc me on relevant emails so that I know what they are up to and give me a heads-up when circumstances start to heat up and they fear they will soon need my help (an ability that is particularly helpful). Others are more difficult to supervise—like automation, they are silent until they are deep in crisis, and cannot cogently explain to me whether they are capable of completing the task and, if so, what assistance they require—and I instinctively avoid allocating them to time-critical and safety-critical tasks.
Constructs such as “explainable AI” are under-specified, and can be interpreted by the technology designers as requiring only that the machine portray its reasoning framed in machine terms. A significant gap remains between these explanations and the specifics needed to understand how the machine may usefully contribute (or fail) in context. This is a gap between technology and operation that Cognitive Engineering can, and should, seek to address.
Going even further, given that almost no machine is fully autonomous, we can help articulate what further capabilities a machine needs to be a good subordinate in a safety-critical, time-critical environment. For here, I again use my insights as a supervisor. Even limiting myself to all behaviors that do not require social skills, empathy and emotional intelligence, a long list of fundamental constructs remain, such as the ability to perform basic sanity-checks on odd looking sensor readings and call out when information appears to be off—and being able to detect and report the fundamental violations of assumptions made during design and safety assurance. Again, we can contribute by demonstrating where and how automation can be designed to help at least detect the abnormal, and perhaps see when it may contribute in such situations.
In summary, I realize that I am taking a viewpoint here that is broader than that of Skraaning and Jamieson by cowardly dodging any definitional debates or fine points of distinction between specific lab studies. Instead, I am glad for the opportunity to voice my hope that our collective studies into automation failures seek to embrace the full complexity of the environment in which they occur and the many tasks that a team may need to juggle. This viewpoint needs to embrace the combination of cognition and machine operation that Cognitive Engineering is well-versed in, and further address the expectations created by leaving the human as being “ultimately responsible.”
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Aeronautics Research Mission Directorate; (NASA Grant 80NSSC19K1702).
