Sage Journals: Discover world-class research

Abstract

Objective

The impact of the context in which automation is introduced to a decision-making system was analyzed theoretically and empirically.

Background

Previous work dealt with causality and responsibility in human-automation systems without considering the effects of how the automation’s role is presented to users.

Methods

An existing analytical model for predicting the human contribution to outcomes was adapted to accommodate the context of automation. An aided signal detection experiment with 400 participants was conducted to assess the correspondence of observed behavior to model predictions.

Results

The context in which the automation’s role is presented affected users’ tendency to follow its advice. When automation made decisions, and users only supervised it, they tended to contribute less to the outcome than in systems where the automation had an advisory capacity. The adapted theoretical model for human contribution was generally aligned with participants’ behavior.

Conclusion

The specific way automation is integrated into a system affects its use and the perceptions of user involvement, possibly altering overall system performance.

Application

The research can help design systems with automation-assisted decision-making and provide information on regulatory requirements and operational processes for such systems.

Keywords

human-automation interaction levels of automation function allocation human systems integration compliance and reliance

Introduction

Human operators collaborate with automated decision support systems to improve task outcomes in complex scenarios in domains, such as medical diagnostics (Esteva et al., 2017; Gardezi et al., 2019; Moreira et al., 2019; Rangayyan et al., 2010), ground transportation (Caballero et al., 2021; Lu et al., 2010; SAE, 2016), aviation (Billings, 1996; Pritchett, 2009; C. D. Wickens et al., 1998), military applications (Naseem et al., 2017; Scharre & Horowitz, 2015), and information technology (IT) security (Salloum et al., 2020). In such systems, humans and automation team up to improve task performance and achieve better outcomes. Such collaborative systems involve decision-making in various stages of the tasks, and both the human operators and automation have important roles in the process. However, what influence does each of them have on the outcome? Who should be held accountable if an incorrect decision is made and the task is deemed a failure? Who contributed more to the result and could therefore be assumed to have caused it, and by how much? Past studies addressed such questions by analyzing specific decisions after they were made. These analyses could not be used to estimate the influence of humans and automation on future decisions. Douer and Meyer (2020, 2021) proposed an information-theory-based model that provided such an a priori analysis. In this model, human causal responsibility is quantified as the proportion of the overall information in the outcome due to the unique information (i.e., reduction in uncertainty) the human contributed.

However, this model does not consider the system configuration and the context in which the automation is presented to the operator, which may affect actual or perceived human causal responsibility. Whether the automation is there to assist humans or to do their jobs, leaving them in supervisory roles, could significantly influence how humans interpret the automation’s recommendations or decisions and how they react to them.

Related Work

In systems where human operators collaborate with automation, the assignment of different activities to the operator and the automation defines the level of automation (LOA) (Endsley, 1987; Endsley & Kaber, 1999; Sheridan, 2012; Sheridan & Verplank, 1978). Using Sheridan’s 8-level scale (Sheridan, 2012), we can categorize the LOAs as either (a) a Decision Support System (DSS), where the automation can suggest an action but not execute it (levels 1–3), (b) Automated Decision-Making (ADM) where the automation can decide and execute the action while allowing the operator to override it (levels 4–5), or (c) an Autonomous System where the automation executes the action without an override option for the operator (levels 6–8). Such systems are sometimes referred to as (a) human in the loop (HITL), (b) human on the loop (HOTL), and (c) human out of the loop (HOOTL), corresponding to the classifications above (Scharre & Horowitz, 2015).

In decision-making processes, when people’s lives, health, or welfare are at stake, it is expected that someone can be held responsible for an incorrect decision that caused harm (Amoroso & Tamburrini, 2020; European Union, 2016; Pritchett, 2024). This is typically required for legal, ethical, and procedural reasons. However, in systems where humans and automation collaborate, it is not easy to attribute responsibility for outcomes. People may tend to ascribe blame to automation, whether it is an artificial intelligence (AI) system, robots, or autonomous cars (Cunningham et al., 2019; Furlough et al., 2021; Kneer & Stuart, 2021; Lima et al., 2023), reflecting concerns about these systems. Regulatory and legal requirements call for meaningful human control (MHC), according to which a human should be held accountable for failed outcomes. Still, automation is increasingly “blamed” for failures, raising the fundamental question of who is responsible for the outcome, the human or the automation, when decisions are made with advanced intelligent systems, and if they share the responsibility, how can it be divided between them? This question has been extensively explored in the literature, but mostly for legal and ethical responsibility (Gerstenberg & Lagnado, 2010; Lagnado et al., 2014; Matthias, 2004). Such an analysis also considers the designers of the automation, the managers or commanders who decided to deploy the system, and the operator’s behavior or decisions beyond the specific use of the system (e.g., if they chose to operate the system while they were tired and unable to function well). However, at the most fundamental level, these analyses of responsibilities do not specify the causal relationships between the different agents and the task outcome, which is better defined by causal responsibility, describing the level of contribution of each agent to the outcome (Hart, 2008). Quantifying causal responsibility would help determine humans’ influence and control on the outcome. Douer and Meyer (2020, 2021) proposed such a quantification by defining causal responsibility as an agent’s unique share in determining the system output’s (the implemented actions) probability distribution. Their Responsibility Quantification (ResQu) model analyzed a system in which automation provides the initial classification and decision that the human considers and determines whether to accept or override. According to this definition, human responsibility is the uncertainty left in the system after the automation issued its classification and recommendation. The ResQu model expresses this using the conditional entropy of the random variable representing the human’s decision (which in their model also represents the resulting action) given the automation’s recommendation, as a portion of the entropy of the resulting action, such that

R e s Q u_{ADM} = \frac{H (X | Y)}{H (Y)} = \frac{H (X, Y) - H (Y)}{H (X)}

(1)

with X and Y denoting the random variables representing the human decision and automation recommendation, respectively, and H(X,Y) being the joint entropy of X and Y. These entropy values are calculated based on analyses of past decisions (for existing, operating systems) or expected theoretical decision probabilities derived from theoretical models or simulations (for newly designed systems). Since entropy represents uncertainty in a random variable, H(X|Y) represents the amount of unique information humans contribute to compensate for the uncertainty left after the automation contributes its information. The information is attributed to the automation unless the human overrides its decision. Thus, if, in a certain situation, the human and automation would have made the same decision when working independently, then when they work together on the same decision under the same conditions, the decision (and the causal responsibility) would be attributed to the automation, as illustrated in Figure 1(a). This aligns with the ADM approach since human operators are merely there to supervise the automation’s operation, and the decision (and action) would have been made without them if they were not there or did not pay attention. We therefore marked the ResQu value in equation (1) with the “ADM” label, as it defines human responsibility in such ADM systems.

Figure 1.

An illustration of the information contributing to the final decision. X, X’, and Y represent the final decision, the human-only decision (without automation), and the automation’s recommendation. The orange-shaded area represents the information attributed to the human, which is translated into responsibility. (a) An ADM system, where the overlapping information is attributed to the automation; (b) a DSS system, where the human is also attributed the information they would have contributed if they were to decide without automation.

The ResQu model described above provides a quantitative indication of the human’s causal responsibility for ADM systems when the human is added to monitor the automation, which would have decided and acted without them if they were not present. However, such logic is challenging in the case of a DSS system, where humans make the decisions, and automation is added to assist them. To illustrate this situation, we can look at a specific (though extreme) case of all-knowing humans and automation, when both humans and automation are perfectly accurate and can always make the correct decision. In this case, we should consider that causal responsibility is associated with general type-causation in Probabilistic Causation (Eells, 1991; Hitchcock, 2021; Saad & Meyer, 2023). The outcome E is caused by an event C if and only if C raises the probability for E to occur, that is, $P (E | C) > P (E | \sim C)$ . Event C, in our case, would be the addition of the second agent to the system (a human for ADM and automation for DSS), and the result E would be defined as a correct decision. In the case of an ADM system, the automation is fully capable of making accurate decisions every time, so adding a human could not increase the probability of a correct decision (which is already equal to 1). The human is therefore not considered to cause the action, so their responsibility should be 0. On the other hand, in a DSS system, an unaided human would always make the right decision, so adding automation cannot increase the probability of a correct decision; the automation is not causing the action in any way, and the human would be entirely responsible.

The above scenario exemplifies how DSS and ADM systems may be viewed differently in the context of human control and responsibility. The difference between them, which is driven by the order in which classifications are made and the role of each agent, influences the decision-making system. This was illustrated, for example, in the European Union’s Artificial Intelligence (AI) Act that allows AI systems to “improve the result of a previously completed human activity” in high-risk systems, but they cannot be the first decision-maker (European Union, 2024). Hence, using a single ResQu model to determine human responsibility in all human and automation collaboration cases may be impossible. However, while it may be simple to determine responsibility in extreme cases, such as the one shown above, in most real-life situations, human and automation accuracy is neither 0 nor absolute, but something in between. Augmenting the ResQu model for quantifying human responsibility in DSS systems is important to allow regulators, managers, commanders, and systems designers to better address the questions of required human control, influence, and responsibility for the outcome.

Based on the above, we suggest that:

H1 Human operators’ contribution to the outcome is seen as lower in ADM than in DSS systems.

H2 Highly accurate humans are seen as contributing more in DSS systems since they can provide more valuable joint information to the decision-making.

H3 Humans view their responsibility to the outcome as higher in DSS than in ADM systems.

Updated Model

In the ResQu model, the human contribution to the outcome is the remaining uncertainty in the results after the automation has contributed its part. It aligns with ADM system behavior by attributing the shared information entirely to the automation. Suppose that in a certain situation, both the human and automation, when operating independently, would have made the same decision. The ResQu model would assume this was an “automation contribution” and assign it to “automation responsibility.” In DSS systems, however, the definition of human responsibility should be adapted to attribute such human and automation’s joint information to the human. In entropy terms, it means that the human responsibility (denoted in this case as ResQu_DSS) is the portion of the conditional entropy of the final decision X given the automation’s recommendation Y, plus information in the final decision that was contributed by the automation but would have been generated by the human if acting alone, as a portion of the entropy of the final decision, as presented in equation (2).

\begin{aligned} R e s Q u_{DSS} = \frac{H (X | Y) + I (X, Y, X^{'})}{H (X)} = R e s Q u_{ADM} + \frac{I (X, Y, X^{'})}{H (X)} \end{aligned}

(2)

with X’ being the random variable representing the human decision if they had made it alone, without the automation. Therefore, the human responsibility for the outcome when working with a DSS system equals the responsibility in an ADM system plus the additional information contribution (normalized) of the human, which they would have brought in an unassisted situation.

To illustrate how the difference between ADM and DSS systems is presented in realistic systems, we used an example of a uni-dimensional binary decision system. The same scenario will be tested empirically in the experiment described below, and this analysis is shown to predict the actual human behavior in the experiment. In this example, the human should decide whether rods are faulty or intact based on their length. The lengths of rods are normally distributed, with intact rods’ distribution being $E_{N} \sim N (μ_{N}, σ_{N}^{2}) = N (L_{0}, 1)$ and faulty rods’ distribution being $E_{S} \sim N (μ_{S}, σ_{S}^{2}) = N (L_{0} + d^{'}, 1)$ , with d′ being the detection sensitivity of the observer. The probability of a rod being faulty is P_S, and the value matrix ratio is defined as U = (V_TN − V_FP)/(V_TP − V_FN) = 0.6667. We analyzed the two cases separately: (1) ADM, where the automation makes the decision and takes an action that the human approves or overrides, and (2) a DSS, where the automation merely advises the human and lets them make the decision and take action. Using probability and signal detection theory (SDT) principles (Green & Swets, 1966; Macmillan, 2001; Sorkin & Woods, 1985; T. D. Wickens, 2002), the independent and joint entropy for the decision (human-only—X’; a human assisted by automation—X; and automation by itself—Y) can be calculated (see Supplemental Material for detailed calculations). The human responsibility can then be calculated using equations (1) and (2).

The analysis results, presented in Figure 2, illustrate how the context in which the automation is presented to the human operator can change the human’s causal responsibility for the outcome. When either the automation or the human sensitivity is low, ADM and DSS systems do not significantly differ in human responsibility. However, as human sensitivity increases, the human responsibility in DSS systems exceeds that of ADM systems, and the difference increases with higher automation sensitivity. This observation is important, considering that the development of technology tends to raise automation sensitivity. Furthermore, there is a desire to maintain meaningful human control, so human sensitivity needs to increase, as illustrated in Figure 2(a) and (b). However, when both the human and automation sensitivity increase, the context in which the automation is presented to the user becomes more critical. If the human’s role is to monitor and approve or override the automation’s decisions, as in ADM systems, their responsibility for the outcome can potentially drop below what would be considered the minimal responsibility for MHC, and regulatory requirements may not be met.

Figure 2:

Analytical model results of human responsibility for (a) ADM system with the original ResQu model and (b) DSS with the adjusted ResQu_DSS model. (c) The difference between the models as a function of the human and automation’s detection sensitivities d′. P_S = 0.3 and U = 0.6667. (a) Responsibility in ADM systems. (b) Responsibility in DSS systems. (c) Difference in responsibility: ResQu_DSS − ResQu_ADM.

While this analysis provides important insights, it assumes that the human operator behaves normatively (according to a model of optimal decision-making to maximize the values of outcomes). An empirical study can provide information about the correspondence of model predictions with actual behavior. An experiment was designed to explore such behavior and assess the effects of the decision-making context on decisions and outcomes.

Experiment

We conducted an experiment to study decisions and evaluations of human responsibility with automation advice in DSS and ADM systems. Based on the above analysis, we predicted that human responsibility would be judged as lower in the ADM configuration than in the DSS configuration.

Method

Participants

Participants in the experiment were recruited through Prime Panels (https://www.cloudresearch.com/products/prime-panels/) (Chandler et al., 2019), with a prerequisite of being at least 18 years old. The participants who completed the experiment received a small monetary incentive for completion based on the platform they were recruited through, and a higher incentive was promised to one of those who would achieve a high score. We had 8 groups in the experiment (see Table 1), so we included 400 participants to achieve the desired power of 1 − β > 0.95 with a moderate effect size of 0.2 (G*Power software, version 3.1.9.7 (Faul et al., 2007)). When analyzing the results, we removed 8 participants who abandoned the experiment before completion and 12 who clicked on the same button through all 40 tests (which suggests they did not take the experiment seriously). The remaining 380 participants still provided a power of 1 − β = 0.973. 5 participants preferred not to disclose any personal details, and of the remaining 375 participants, 234 were females and 139 males (2 classified themselves as “other”). The age distribution was 65+ (61%), 46–64 (27%), 31–45 (9%), and 18–30 (3%). 37% reported high-school education, 34% a BA/BSc degree, 14% an MA/MSc or PhD degree, and 14% preferred not to disclose their education. The research complied with the American Psychological Association Code of Ethics and was approved by the Institutional Review Board at Tel Aviv University. Informed consent was obtained from each participant.

Table 1.

Participants Groups.

Group No.	Test Type	$d_{A}^{'}$	$d_{H}^{'}$	Participants	Tests per Participant
1	ADM	1.65	1.65	50	40
2	ADM	1.65	3.0	50	40
3	ADM	3.0	1.65	50	40
4	ADM	3.0	3.0	50	40
5	DSS	1.65	1.65	50	40
6	DSS	1.65	3.0	50	40
7	DSS	3.0	1.65	50	40
8	DSS	3.0	3.0	50	40

Apparatus

The experimental system was a web application written in Python 3.9. The interaction with the application was through a standard web browser running on desktop computers, laptops, and smartphones. The system simulated the manufacturing of titanium rods, where the rods’ lengths were a one-dimensional measure of their quality: intact or faulty. Faulty rods were, on average, longer than intact rods. The participants were asked to determine if the manufactured rods were intact or faulty. Rods were faulty with a probability of P_S = 0.3, which means that 4800 rods out of the total 16,000 rods (400 participants x 40 tests each) were faulty. These were randomly assigned to the tests. Each participant had a certain accuracy level, $d_{H}^{'}$ , which was the possible detection sensitivity given the information shown. It was generated by presenting them with rods of length L₀ + ϵ for intact rods, and $L_{0} + d_{H}^{'} + ϵ$ for faulty rods, with L₀ = 4 length units and ϵ is a normally distributed random noise with zero mean and a variance of 1 $(E \sim N (0,1))$ that was sampled for each decision independently. The experiment could be viewed on various devices with different screen sizes, so “length units” were proportional to the viewing screen size of the participant’s web browser. The mean length of the intact rods L₀ was selected to be four times the standard deviation of intact rods’ lengths to practically eliminate situations where the visible sample rod would have a negative length (for which the probability was 0.00003, and it practically never occurred in the experiment).

The participants received advice from automation, which was presented as an ADM or a DSS. In an ADM system, the automation’s classification was described as the “QA system decision” with a graphical symbol of a quality seal (“Approved,” “Rejected”), and the participants had to choose between “Override Decision” or “Approve Decision.” For a DSS system, participants saw a “QA system recommendation” with a graphical representation of a simple colored circle, green (Hex color code: #4CAF50) for an intact system or red (#FF0000) for a faulty system. They had to choose whether it was a “Faulty Rod” or “Intact Rod” (Figure 3). Except for the above changes in the way some information was presented to the participants, the system behaved the same in both cases, that is, the system waited for the participants’ decision before moving to the next test, aligned with Sheridan’s Level 4 automation (Sheridan, 2012). The automation was assumed to function according to SDT with an optimal decision threshold and detection sensitivity of $d_{A}^{'}$ . This was achieved by sampling the length the automation would “see” in a trial (L₀ + ϵ for intact rods and $L_{0} + d_{A}^{'} + ϵ$ for faulty rods), and using this length in an SDT decision algorithm for a decision. The payoffs participants were presented with and the DSS algorithm used for calculating the threshold were a gain of 1 point for a correct decision, a loss of 2 points for manufacturing a faulty rod, and a loss of 1 point for rejecting an intact rod. After each decision, feedback was given on whether the decision was correct.

Figure 3.

Experiment screenshots for (a) ADM and (b) DSS systems. Text and graphics were differently designed to convey the differences between ADM and DSS system types.

Procedure

The participants were told they were quality assurance (QA) inspectors of a factory that manufactured titanium rods and were required to determine if the manufactured rods were intact or faulty. Half of the participants were presented with a DSS system. They were told that the system was there only to help them decide. In contrast, the other participants were presented with an ADM system and were instructed to monitor an intelligent system making decisions by itself and either accept or override its decisions as needed. The human and automation detection sensitivities were selected in the following four combinations:

(d_{A}^{'}, d_{H}^{'})

= (1.65,1.65), (1.65,3.0), (3.0,1.65), (3.0,3.0), resulting in 8 groups of participants (see Table 1). Based on the theoretical model (2), the differences in the responsibility between DSS and ADM for the four sensitivity combinations should be 0.08, 0.18, 0.18, and 0.44, respectively. Each participant was presented with 40 independent decisions and with the recommendation/decision of the automation for each. The first 10 decisions were used to allow them to learn the system and figure out the automation’s accuracy level. They were not included in the analysis of the results (lengthening the learning phase to 20 tests did not cause any significant change in the results). After completing 40 such decisions, each participant responded to 5 statements that addressed their subjective perception of various aspects of the experiment, as listed in Table 2. Each statement had six Likert-scale possible responses, ranging from Strongly Disagree to Strongly Agree (Likert, 1932; Simms et al., 2019).

Table 2.

Subjective Questions About the System and Own Performance and Responsibility.

Factor	Statement #	Statement
$d_{A}^{'}$	Q1	The QA system could identify well whether the machine was faulty or intact.
$d_{H}^{'}$	Q2	I was able to identify by myself (without the QA system’s help) whether the machine was faulty or intact.
Human influence	Q3	I mostly relied on my own perspective when making a decision.
Automation influence	Q4	I relied on the QA system’s indications more than on my own judgment.
Human responsibility	Q5	I felt responsible for the outcome.

Results

We calculated the probabilities to act and the joint probabilities for human and automation combinations to act. To calculate the entropy values defined above, we also needed the probability for human-only decisions to act (without the automation’s advice) and its joint probabilities with assisted human and automation-only decisions. Since we did not have human-only decisions (all participants were presented with automation advice), we used instead the calculated expected decision of a normative human using optimal decision criteria. The results for the actual task performance are included in the Supplementary Material.

Measured Responsibility

Figure 4(a) presents the measured human responsibility. An Analysis of Variance (ANOVA) of the results (see Table 3) showed that the system type influenced human responsibility, driving higher responsibility in DSS versus ADM systems, F (1,378) = 18.86, p < .001. Additionally, as expected, the human responsibility was lower for more accurate automation, F (1,378) = 33.17, p < .001. However, human accuracy had no significant effect on the responsibility, which could be explained by the significant interaction between human accuracy and the system type, F (1,372) = 13.03, p < .001. For DSS systems, higher human accuracy drove higher mean responsibility, F (1,195) = 7.4, p = .007, while, surprisingly, for ADM systems, the higher human accuracy reduced the mean responsibility, F (1,181) = 4.17, p = .042.

Figure 4.

Experiment results. (a) Human responsibility as a function of human accuracy. The automation accuracy is represented by squares $(d_{A}^{'} = 1.65)$ and dots $(d_{A}^{'} = 3.0)$ . The ADM system is in red, while the DSS system is in green. Vertical lines represent 95% CI. (b) Differences between ADM and DSS systems. The gray bars represent the difference in human responsibility between DSS and ADM. Black dots represent the theoretical, expected values per adapted ResQu model, and red squares show the subjective responsibility as perceived by the participants in their responses to survey question Q5.

Table 3.

ANOVA Results for Measured Responsibility.

	Responsibility		Q1 response		Q2 response		Q3 response		Q4 response		Q5 response
	MS	F	MS	F	MS	F	MS	F	MS	F	MS	F
d’_A	1.95	36.74***	13.88	7.47**	19.67	9.73**	1.92	1.05	12.28	5.09*	4.54	2.53
d’_H	0.01	0.27	10.5	5.65*	15.35	7.6**	0.02	0.01	0.38	0.16	6.23	3.47
TestType	1.18	22.24***	0.63	0.37	1.00	0.49	0.74	0.41	4.44	1.84	3.91	2.18
d’_A:d’_H	0.007	0.13	2.67	1.44	0.69	0.34	1.92	1.05	2.21	0.91	8.01	4.45*
d’_A:TestType	0.48	8.99**	0.26	0.14	2.52	1.25	0.02	0.01	0.00	0.00	0.86	0.48
d’_H:TestType	0.69	13.03***	3.19	1.72	2.00	0.99	0.73	0.40	5.22	2.16	11.14	6.20*
d’_A:d’_H:TestType	0.11	2.1	0.65	0.35	0.001	0	0.20	0.11	0.06	0.02	2.14	1.19

Significance codes: *p < .05; **p < .01; ***p < .001.

To better observe the differences in responsibility between ADM and DSS systems, the difference between the means of the measured responsibility in both systems (ResQuDSS − ResQuADM) is presented by the vertical bars in Figure 4b for the four tested $d_{A}^{'}$ and $d_{H}^{'}$ combinations. For reference, the theoretical expected values are presented (as black dots), and the empirical results seem to align with the same behavior as the theoretical models.

The participants’ subjective perceptions of the situation during the experiment were captured by their answers to the survey questions.

Assessing Human and Automation Accuracy

When asked about the accuracy of the automation (Q1), the participants ranked the more accurate automation as higher, F (1,378) = 7.4, p = .007. When they estimated their own accuracy (Q2), the more accurate participants rated themselves higher, F (1,378) = 7.19, p = .0076. Interestingly, we also observed that accurate participants tended to rate the automation higher in Q1, F (1,372) = 5.65, p = .018, and participants rated themselves higher in the presence of accurate automation, F (1,372) = 9.73, p = .002. It appears that the presence of an accurate agent (human or automation) influenced the subjective perception of the accuracy of the other agent.

Estimating influence: In their response to Q3 about perceptions of their self-reliance, no significant dependency was observed on the actual human accuracy. However, their responses increased with their subjective, self-perceived accuracy, F (1,376) = 49.8, p < .001, and decreased with their perception of the automation accuracy, F (1,376) = 8.74, p = .0033. Similarly, while their perceived automation influence (Q4) increased for higher actual automation accuracy, F (1,372) = 5.09, p = 0.025, we observed a much higher affinity with their perception of the automation accuracy as shown in their response to Q1, F (1,376) = 87.1, p < .001.

Determining Responsibility

Finally, the participants were asked about their subjective responsibility for the outcomes (Q5). The mean responses for this question are illustrated as red dots in Figure 4b. Neither the participants’ nor the automation’s actual accuracy had significant effects. However, there was a significant effect of the perceived accuracy of both (responses to Q1 and Q2) on the response to Q5 for both accurate and less accurate participants (for all groups: F (1,182)>14, p < .001). The effect of the system type (ADM or DSS) was only statistically significant for less accurate participants, F (1,182) = 5.85, p = .017. It seems that accurate participants held themselves responsible for the outcome in either system type.

Discussion

Automation is usually added to decision processes to assist humans or even replace them while usually allowing human oversight and the possibility to override the automation’s decisions. In sensitive systems, regulations usually require MHC to prevent possible problems that may arise in fully autonomous decision making. However, having a human operator in the decision-making process will not necessarily be sufficient to address this requirement adequately. Previous studies demonstrated that when the human’s detection sensitivity is inferior to the automation’s, the human’s causal responsibility for outcomes will be low, leading to potentially less-than-meaningful human control (Douer & Meyer, 2020, 2021). We demonstrated here that the human’s causal responsibility for the outcome also depends on the configuration of the context of the human involvement in the decision-making process.

In our model, the context is particularly important when both humans and automation are accurate. When the human is identified as the primary decision-maker and the automation is presented in an advisory capacity (DSS), the human’s causal responsibility is higher than when the automation is presented as the main decision-maker with the human supervising its work and accepting or overriding its decisions (ADM). The experiment showed the predicted results. Even though the participants generally assumed higher responsibility than optimally expected by the theoretical model, the dependency on the context of the automation’s assistance remained, and their actual responsibility was higher in DSS systems than in ADM systems. However, the participants’ average perception of their responsibility in such situations was surprisingly higher in ADM systems than in DSS systems (Fig. 4b), suggesting that humans tend to assume higher responsibility than their actual influence on the outcome.

When either the human or the automation was accurate while the other was not, the theoretical difference between DSS and ADM was much smaller. The actual measured responsibility difference was even smaller. It was almost negligible with more accurate participants who tended to hold themselves responsible regardless of the context. As predicted by the model, no significant differences were observed in the experiment when both human and automation accuracy were low.

The above analyses and findings provide several insights:

Human contribution to the outcome is context-sensitive, especially when both human and automation are accurate. As automation accuracy improves, considering the context in which automation is used in the system is essential to ensure MHC. On the one hand, one should strive to increase the human’s accuracy to maintain more substantial human influence on the outcome, above the minimum needed for MHC; see Figure 2 and Douer and Meyer (2020). However, as shown here, thereby the system’s context would have a stronger effect. A clear understanding of the context and clarifying it to the human operators becomes imperative to ensure MHC and compliance with regulatory requirements since, as shown in Figure 2, the human responsibility could vary significantly between extreme values (0.32 and 0.76), and could drop below the MHC threshold. This confirms our first hypothesis H1 that in DSS systems, the human contribution to the outcome is greater than in ADM systems. Furthermore, as shown in the model (Figure 2(c)) and empirically in Figure 4(b), the difference in the contribution is greater for accurate humans, both theoretically and in practice, confirming our hypothesis H2. This finding suggests that policymakers should carefully consider the context in which intelligent automation is deployed. Similarly to the EU requirements mentioned above (European Union, 2024), putting automation in the context of DSS can make the contribution of the human operator more meaningful, achieving the goal of having meaningful human control in the decision loop.

Humans overestimate their influence on the outcome, especially when they use accurate automation. This result has been observed in the past (Bartlett & McCarley, 2017; Douer & Meyer, 2021; Maltz & Meyer, 2001; Meyer, 2004), and our experiment helps to identify the behavior that leads to it. In the presence of accurate automation, human operators tend to overestimate their own accuracy and rate themselves higher. Then, since they assume responsibility based on their perceived accuracy (vs. their actual accuracy), they assume higher responsibility and follow the automation’s recommendations less than optimally required. This leads to situations where humans “listen” less to automation, especially when they could most benefit from it. Therefore, systems designers and process managers should clearly communicate to the operators the accuracy of the automation versus their own so that they gain the advantages it can provide. Interestingly though, while less accurate participants estimated their responsibility for the outcomes as higher in DSS than in ADM systems, as anticipated by our hypothesis H3, accurate participants viewed their responsibility as the same, or even higher, in ADM than in DSS (Figure 4(b)). This may be explained by the latter feeling more empowered to make decisions and therefore consider themselves responsible in either situation, even though their theoretical and practical contribution to the outcome was higher in DSS than in ADM.

Determining human’s causal responsibility should consider the human’s perceptions. The ResQu model computed causal responsibility with a simple analytical model. We have shown here that it is necessary to consider additional factors. The system type determines which information in the outcome was contributed by which agent and, in particular, to whom the joint information that could have been created by each agent independently should be attributed. This is not just a mathematical analysis of the information contributions, but, as was shown empirically, the participants in the experiment changed their behavior based on how they perceived the situation, ADM or DSS. Additionally, their estimate of the automation and their own accuracy were important in their decision to follow the automation. The fact that human operators’ perceptions, determined by how the system is presented to them, influence the level at which they would follow the automation’s advice aligns with findings in a different context, showing that positioning the automation as “AI” or a “Rule-based system” changed how humans reacted to its advice after it was wrong a few times (Candrian & Scherer, 2024).

Limitations and potential further research. Most participants in our study declared that they were older than 65. We do not expect that this impacted the results since the task was very simple, and they were sufficiently tech-savvy to connect and operate the experiment on their device. However, future work can study the potential influence of the participant’s age on their tendency to follow advice from ADM or DSS. The analysis also used the theoretical optimal decision criteria to estimate the human-only (unassisted) decision probabilities. This was done due to practical reasons (it was impossible to have the same person make independent decisions on the same scenario twice, with and then without automation). Still, it may have slightly influenced the results. Additionally, it may be of interest to explore how task complexity and criticality influence users’ following the automation advice by experimenting with more realistic scenarios resembling common high-stakes tasks. In complex decisions with extremely high failure costs, humans may rely more on automation, as it may give them a sense of confidence and transfer of liability.

Conclusions

As automation’s role in decision-making processes grows, the level of human involvement becomes an essential issue when considering where and how to deploy such automation. We demonstrated here that the context in which automation is presented is important to determine humans’ causal responsibility for the outcomes. This context also influences how humans act and make decisions while being advised by automation. Setting the system in the right context is necessary to ensure adequate behavior and proper analyses of human influence, ensuring MHC.

Key Points

• Defining a system as a DSS or ADM affects the human contribution to the process and how humans perceive their influence on the outcome.

• Human contribution is greater in DSS than ADM, especially when human sensitivity is higher.

• Humans overestimate their influence on the outcome, especially when they use accurate automation.

Supplemental Material

Supplemental Material - Context-Based Human Influence and Causal Responsibility for Assisted Decision-Making

Supplemental Material for Context-Based Human Influence and Causal Responsibility for Assisted Decision-Making by Yossef Saad, and Joachim Meyer in Human Factors.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is part of the first author’s PhD dissertation. This work was partly funded by the Israel Science Foundation Grant 2019/19 to the second author and by a grant from the Tel Aviv University Center for AI and Data Science (TAD).

ORCID iDs

Yossef Saad

Joachim Meyer

Supplemental Material

Supplemental material for this article is available online.

Author Biographies

Yossef Saad is a PhD student at Tel Aviv University. He received his BSc and MSc degrees in electrical engineering from the Technion, Israel Institute of Technology (1986) and Tel Aviv University (1992), respectively.

Joachim Meyer is the Celia and Marcos Maus Professor of Data Science at the Department of Industrial Engineering, Tel Aviv University, Israel, and a Fellow of HFES. He received his MA degree in psychology (1986) and a PhD degree in industrial engineering (1994) from Ben-Gurion University of the Negev, Beersheba, Israel.

References

Amoroso

Tamburrini

(2020). Autonomous weapons systems and meaningful human control: Ethical and legal issues. Current Robotics Reports, 1(4), 187–194. https://doi.org/10.1007/S43154-020-00024-3

Bartlett

M. L.

McCarley

J. S.

(2017). Benchmarking aided decision making in a signal detection task. Human Factors: The Journal of the Human Factors and Ergonomics Society, 59(6), 881–900. https://doi.org/10.1177/0018720817700258

Billings

C. E.

(1996). Human-centered aviation automation: Principles and guidelines (tech. rep.). NASA. NASA Ames Research Center. https://ntrs.nasa.gov/api/citations/19960016374/downloads/19960016374.pdf

Caballero

W. N.

Ríos Insua

Banks

(2021). Decision support issues in automated driving systems. International Transactions in Operational Research, 30(3), 1–29. https://doi.org/10.1111/itor.12936

Candrian

Scherer

(2024). How terminology affects users’ responses to system failures. Human Factors, 66(8), 2082–2103. https://doi.org/10.1177/00187208231202572

Chandler

Rosenzweig

Moss

A. J.

Robinson

Litman

(2019). Online panels in social science research: Expanding sampling methods beyond Mechanical Turk. Behavior Research Methods, 51(5), 2022–2038. https://doi.org/10.3758/s13428-019-01273-7

Cunningham

M. L.

Regan

M. A.

Horberry

Weeratunga

Dixit

(2019). Public opinion about automated vehicles in Australia: Results from a large-scale national survey. Transportation Research Part A: Policy and Practice, 129(1), 1–18. https://doi.org/10.1016/J.TRA.2019.08.002

Douer

Meyer

(2020). The responsibility quantification model of human interaction with automation. IEEE Transactions on Automation Science and Engineering, 17(2), 1044–1060. https://doi.org/10.1109/TASE.2020.2965466

Douer

Meyer

(2021). Theoretical, measured and subjective responsibility in aided decision making. ACM Transactions on Interactive Intelligent Systems, 11(1), 1–37. https://doi.org/10.1145/3425732

10.

Eells

(1991). Probabilistic causality. Cambridge University Press. https://doi.org/10.1017/CBO9780511570667

11.

Endsley

M. R.

(1987). The application of human factors to the development of expert systems for advanced cockpits. Proceedings of the Human Factors Society Annual Meeting, 31(12), 1388–1392. https://doi.org/10.1177/154193128703101219

12.

Endsley

M. R.

Kaber

D. B.

(1999). Level of automation effects on performance, situation awareness and workload in a dynamic control task. Ergonomics, 42(3), 462–492. https://doi.org/10.1080/001401399185595

13.

Esteva

Kuprel

Novoa

R. A.

Swetter

S. M.

Blau

H. M.

Thrun

(2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118. https://doi.org/10.1038/NATURE21056

14.

European Union . (2016). Article 22 - Automated individual decision-making, including profiling. https://gdpr-info.eu/art-22-gdpr/

15.

European Union . (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC). https://eur-lex.europa.eu/eli/reg/2024/1689/oj

16.

Faul

Erdfelder

Lang

A.-G.

Buchner

(2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/BF03193146

17.

Furlough

Stokes

Gillan

D. J.

(2021). Attributing blame to robots: I. The influence of robot autonomy. Human Factors, 63(4), 592–602. https://doi.org/10.1177/0018720819880641

18.

Gardezi

S. J. S.

Elazab

Lei

Wang

(2019). Breast cancer detection and diagnosis using mammographic data: Systematic review. Journal of Medical Internet Research, 21(7), Article e14464. https://doi.org/10.2196/14464

19.

Gerstenberg

Lagnado

D. A.

(2010). Spreading the blame: The allocation of responsibility amongst multiple agents. Cognition, 115(1), 166–171. https://doi.org/10.1016/J.COGNITION.2009.12.011

20.

Green

D. M.

Swets

J. A.

(1966). Signal detection theory and psychophysics. John Wiley & Sons. https://psycnet.apa.org/record/1967-02286-000

21.

Hart

H. L.

(2008). Punishment and responsibility. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199534777.001.0001

22.

Hitchcock

(2021). Probabilistic causation. In Zlata

E. N.

(Ed.), The stanford encyclopedia of philosophy. Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/spr2021/entries/causation-probabilistic/

23.

Kneer

Stuart

M. T.

(2021). Playing the blame game with robots. In Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, pp. 407–411. IEEE. https://doi.org/10.1145/3434074.3447202

24.

Lagnado

D. A.

Gerstenberg

Zultan

(2014). Causal responsibility and counterfactuals. Cognitive Science, 37(6), 1036–1073. https://doi.org/10.1111/cogs.12054

25.

Likert

(1932). A technique for the measurement of attitudes. Archives de Psychologie, 22(140), 5–55.

26.

Lima

Grgić-Hlača

Cha

(2023). Blaming humans and machines: What shapes people’s reactions to algorithmic harm. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1–26). IEEE. https://doi.org/10.1145/3544548.3580953

27.

S. N.

Tseng

H. W.

Lee

Y. H.

Jan

Y. G.

Lee

W. C.

(2010). Intelligent safety warning and alert system for car driving. Tamkang Journal of Science and Engineering, 13(4), 395–404.

28.

Macmillan

N. A.

(2001). Signal detection theory. In International Encyclopedia of the Social & Behavioral Sciences (pp. 14075–14078). Elsevier. https://doi.org/10.1016/b0-08-043076-7/00677-x

29.

Maltz

Meyer

(2001). Use of warnings in an attentionally demanding detection task. Human Factors, 43(2), 217–226. https://doi.org/10.1518/001872001775900931

30.

Matthias

(2004). The responsibility gap: Ascribing responsibility for the actions of learning automata. Ethics and Information Technology, 6(3), 175–183. https://doi.org/10.1007/S10676-004-3422-1

31.

Meyer

(2004). Conceptual issues in the study of dynamic hazard warnings. Human Factors, 46(2), 196–204. https://doi.org/10.1518/hfes.46.2.196.37335

32.

Moreira

M. W.

Rodrigues

J. J.

Korotaev

Al-Muhtadi

Kumar

(2019). A comprehensive review on smart decision support systems for health care. IEEE Systems Journal, 13(3), 3536–3545. https://doi.org/10.1109/JSYST.2018.2890121

33.

Naseem

Shah

S. T. H.

Khan

S. A.

Malik

A. W.

(2017). Decision support system for optimum decision making process in threat evaluation and weapon assignment: Current status, challenges and future directions. Annual Reviews in Control, 43(1), 169–187. https://doi.org/10.1016/J.ARCONTROL.2017.03.003

34.

Pritchett

A. R.

(2009). Aviation automation: General perspectives and specific guidance for the design of modes and alerts. Reviews of Human Factors and Ergonomics, 5(1), 82–113. https://doi.org/10.1518/155723409x448026

35.

Pritchett

A. R.

(2024). Things go wrong and the captain has to handle it. Journal of Cognitive Engineering and Decision Making, 18(4), 365–369. https://doi.org/10.1177/15553434241236536

36.

Rangayyan

R. M.

Banik

Desautels

J. E.

(2010). Computer-aided detection of architectural distortion in prior mammograms of interval cancer. Journal of Digital Imaging, 23(5), 611–631. https://doi.org/10.1007/S10278-009-9257-X

37.

Saad

Meyer

(2023). Quantifying levels of influence and causal responsibility in dynamic decision making events. ACM Transactions on Intelligent Systems and Technology, 15(1), 1–22. https://doi.org/10.1145/3631611

38.

SAE . (2016). Surface vehicle recommended practice - taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles.

39.

Salloum

S. A.

Alshurideh

Elnagar

Shaalan

(2020). Machine learning and deep learning techniques for cybersecurity: A review. In Advances in Intelligent Systems and Computing, 1153 AISC (pp. 50–57). https://doi.org/10.1007/978-3-030-44289-7_5

40.

Scharre

Horowitz

M. C.

(2015). An Introduction to autonomy in weapon systems. Center for a New American Security. https://www.cnas.org/publications/reports/an-introduction-to-autonomy-in-weapon-systems

41.

Sheridan

T. B.

(2012). Human supervisory control. In Salvendy

(Ed.), Handbook of human factors and ergonomics (4th ed.). John Wiley & Sons Inc.

42.

Sheridan

T. B.

Verplank

W. L.

(1978). Human and computer control of undersea teleoperators (tech. rep.). Office of Naval Research.

43.

Simms

L. J.

Zelazny

Williams

T. F.

Bernstein

(2019). Does the number of response options matter? Psychometric perspectives using personality questionnaire data. Psychological Assessment, 31(4), 557–566. https://doi.org/10.1037/pas0000648

44.

Sorkin

R. D.

Woods

D. D.

(1985). Systems with human monitors: A signal detection analysis. Human–Computer Interaction, 1(1), 49–75. https://doi.org/10.1207/s15327051hci0101_2

45.

Wickens

C. D.

Mavor

A. S.

Parasuraman

MCgee

J. P.

(1998). The future of air traffic control. National Academies Press. https://doi.org/10.17226/6018

46.

Wickens

T. D.

(2002). Elementary signal detection theory. Oxford University Press.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.20 MB