Sage Journals: Discover world-class research

Abstract

In target detection tasks false alarms (i.e., indicating a target is present when it is absent) decrease trust more than misses. Furthermore, human advisors providing advice at the same time as automation, may impact how users trust and subsequently rely on automated aids. This study aimed to understand whether the false alarm rate (FAR) of an automated target recognition aid impacts trust in the automated aid, trust in a human teammate, or operator self-confidence in a dual-advisor target detection task. Participants completed a mine detection task while receiving advice from a human and an automated advisor. The FAR of the automation was manipulated between groups and trust in each type of advisor was measured. Automation FAR did not influence trust in the automation. Low FAR automation was associated with higher trust in a human teammate and increasing self-confidence over the course of the experiment.

Keywords

signal detection false alarm rate defence military psychology

Background

Automated target recognition systems (ATRs) are used to locate items of interest within a noisy environment. ATRs perform well in defined tasks but reliability can decline in unanticipated situations, so they are often paired with a human operator. In these human-automation teams, even imperfect ATRs can improve performance (Reiner et al., 2017).

One application of ATRs is the detection of underwater mines, which are a prominent threat to maritime operations (Ho et al., 2011). A leading mine countermeasure is the evaluation of SONAR images of the sea floor (Figure 1) by operators to determine whether a mine is present in areas of interest (Hammond et al., 2021). Given the poor quality of SONAR images, relatively rare occurrence of a mine, and the high density of clutter on the seabed, this is a challenging task. To assist operators in the detection and classification of mine-like objects, ATRs are employed to cue operators to the location of mine-like objects for further inspection. However, industry has reported concerns of widespread disuse of these systems.

Figure 1.

Example of an experimental trial. Participants were initially presented with the image without advice and asked to give their initial impression. Advice was then presented as depicted below (square around a potential mine location if the agent indicated a mine was present and text below the SONAR image) and participants entered their final assessment.

Trust and Self-Confidence

While disuse is concerning, complete dependence on ATRs would defy the purpose of human-automation teams – the greatest performance is expected when dependence is calibrated to the capabilities of a system (Lee & See, 2004). The decision to depend on an agent is believed to stem largely from trust (Hoff & Bashir, 2015), which is the attitude that an agent will help achieve an individual’s goals in a situation characterized by uncertainty and vulnerability (Lee & See, 2004, p. 51). Dependence decisions are believed to arise from a weighing of trust in each involved agent (Merritt et al., 2015). While working with ATRs, the operator is expected to compare their trust in the system with their self-confidence in performing the task and align their decision with the more trusted agent (cf. Williams et al., 2023). Appropriate dependence is then achieved by calibrating trust to each agent's true capabilities (Lee & See, 2004).

The perfect automation schema (PAS) describes a tendency for operators to believe that automated systems perform perfectly, and a subsequent distrust in the system once errors are observed (Dzindolet et al., 2002). Catching blatant mistakes undermines trust in the system and bolsters the operator’s self-confidence (Madhaven et al., 2006). Even when errors are not blatant, operators tend to inaccurately attribute poor task outcomes to characteristics of automated systems rather than their own capabilities (Douer & Meyer, 2022). Since highly reliable systems remain imperfect in most applied settings, disuse of automated systems often stems from poorly calibrated trust that is below the systems capabilities.

False Alarm Rate

In target detection (i.e., signal detection) the environment is classified into two possible states: target present or target absent. The combination of an operator or system response, and the true state of the world creates four outcome categories – hit, miss, correct rejection and false alarm (FA; Wickens et al., 2021). While the overall error rate of ATRs is typically limited by technological capabilities, the type of errors made can be adjusted by changing the systems threshold to recognize an object as a target, expressed as the response bias (see Signal Detection Theory; Wickens et al., 2021). While a liberal (low) response bias is expected to produce the highest number of hits (i.e., mines detected, a desired result), the low threshold also results in a greater number of FAs which may be problematic. Frequent FAs may incite a cry wolf effect (Breznitz, 1983) where a belief of low alarm reliability diminishes alarm responses (Bliss et al., 1995). Excessive FAs appear to reduce trust, leading to disuse and neglect of alarms for hits (Culley & Madhavan, 2013; Manzey et al., 2014).

Dual Advisor Scenarios

While there is a significant body of literature on the factors that influence dependence when a single operator interacts with an automated system, there is limited research considering more complex multi-human teams. Merritt et al. examined trust and dependence in a single (2011) and dual-advisor (2015) target detection task. While trust was associated with dependence on the ATRs throughout the single advisor task, in the dual advisor setting this was only true for the first half of the study suggesting trust may evolve differently in dual advisor settings. These findings highlight the need for further research into the role of automated systems in multi-human environments.

Biases to distrust automated systems may be exacerbated by additional teammates. Since humans are expected to commit errors, the literature suggests that no perfect human schema exists (Merritt et al., 2014). When advice is offered from automated and human sources, trust in the automated source may be disproportionately impacted (Madhavan & Wiegmann, 2007). Since there are only two possible recommendations in binary outcome tasks (i.e., target present or absent) disagreements between dual advisors will always place the operators’ impression in a 2-1 majority. If dependence decisions are determined by weighing trust in agents, a guaranteed 2-1 majority promotes self-reliance and disuse of all advice (Merritt et al., 2015). Disuse of aids in a dual advisor context, as well as a disproportionate effect of FAs on trust in ATRs, may explain the disuse of ATRs’ observed in industry. Few studies have examined the influence of FAs on trust and dependence behaviors in complex environments and no studies to date have examined the effect of FAs in a dual advisor task, nor have they measured trust in each agent and operator self-confidence simultaneously.

Purpose and Hypotheses

This study aimed to understand whether the false alarm rate of an ATRs impacts trust in the ATR, trust in a human teammate, or self-confidence while completing a dual-advisor target detection task. Participants completed a mine detection task with advice from an imperfect ATRs and a human teammate. Critically, the reliability level (i.e., the total number of errors) was constant between the human teammate and ATR; however, the response bias of the ATRS was manipulated between groups (High FAR group: the system made more FA than miss errors; Low FAR group: the system made fewer FA than miss errors). We measured trust in the ATRS and human teammate, as well as self-confidence, at baseline and after each of three experimental blocks.

Stemming from Merritt et al.’s (2015) finding that in a binary outcome task (i.e., target present, or not) the presence of dual advisors leads to advice disuse in favor of self-reliance and decreased trust in both a human and automated advisor, we hypothesized that as participants became more familiar with the task, their self-confidence would increase across the blocks and trust in their human teammate would decrease. As participants completed trials, and observed errors made by the ATR, we predicted that incongruities between a PAS and reality would result in decreased trust in the ATR. Finally, since FAs were expected to be more salient than misses and result in more evident violations of a PAS, we hypothesized that trust in the ATRS would decline more rapidly in the high FAR group compared to the low FAR group.

Methods

Participants

50 people (37 female, 12 male, 1 not disclosed) with an average age of 19.98 years (SD = 1.87) were recruited from a psychology student pool. Participants self-screened for normal or corrected to normal vision. To encourage engagement in the task, participants were told they would receive a 10 CAD honorarium if they were a top performer. However, all participants received a 10 CAD honorarium and credit towards a psychology course regardless of their performance. This study was approved by the Dalhousie University Research Ethics Board [REB 2019-4935] and informed consent was obtained from all participants.

Design

Participants completed an underwater mine detection task assisted by an ATRS and a human teammate (herein known as the co-commander). The simulated task was presented on a custom MATLAB (R2017b) script. Participants were seated in front of a monitor (res. 2048x1152) and used a computer mouse to interact with the simulation.

In each trial, a SONAR image of the seabed was displayed on the monitor. Participants were asked to indicate an initial impression of whether a mine was present by either clicking over the area they believed to contain a mine, or a “no mine present” button (Figure 1). Once their initial impression was entered, advice from an ATRS and co-commander displayed at the bottom of the screen. If either agent advised that a mine was present, a blue (ATRs) or red (co-commander) box appeared around the suspected mine location. Participants then indicated their final assessment by clicking on the mine location or the “no mine” button. This technique of two responses (initial impression and final assessment) for each trial has been used previously (Merritt et al., 2015) and aims to separate dependence on aids from personal judgements and aid disuse. The study consisted of 340 trials spread over three blocks (114, 113 and 113 trials respectively). Each trial consisted of one image. Images were presented in a randomized order for each participant.

Participants were assisted by an ATRS whose response bias differed between groups. The ATRS with a high FAR made FA errors on 25% of the trials and miss errors on 5.6% with the error percentages reversed for the low FAR ATR. Importantly, both ATRs were 70% reliable (errors in 30.6% of trials) and differed only in the type of error that was made. Group assignment was randomized with participants blind to their group as well as the FAR and reliability of the ATR.

The human-teammate was framed as a “co-commander” who could make recommendations, and the participant was the “lead commander” who made the final assessment regarding the presence of a mine. The co-commander’s advice maintained a reliability similar to the ATRs but had a neutral response bias. FA and miss errors were made in 13.5% of trials each (73% reliable). The FAR of the ATRS were high and low, respectively, relative to each other and to the co-commander. The reliability of both agents reflects a 70% minimal reliability of automated aids that are expected to improve performance compared to manual tasks (Wickens & Dixon, 2007), while maximizing trials in which errors could be observed to differentiate the FAR groups.

Measures

Questionnaires were used to evaluate trust in each agent. Trust in the ATRS was evaluated with Jian et al.’s (2000) Checklist for Trust between People and Automation. Trust in the co-commander and self-confidence were assessed with modified versions of this questionnaire (e.g., “I am suspicious of the system’s intent, action, or outputs” was changed to “I am suspicious of my co-commander’s ability to identify mines”). Similar questionnaire modification was previously used by Knocton et al. (2021). Dependence and performance data, as well as signal detection measures of response bias and sensitivity, were gathered but given the space constraints will be reported in a separate publication.

Procedure

Participants attended a single 1.5-2 hour session. They were told they would be working with another participant; however, the other person was a member of the research team. It was explained that they would be randomly assigned to one of two roles – one teammate would be responsible for determining whether a mine was present (lead commander) and the other would offer advice (co-commander). To present the illusion of randomized roles, participants were asked to draw their role from a hat, which always indicated that they would be the lead commander. The co-commander (confederate) was moved to the adjacent room for the remainder of the study under the premise that another (fictious) researcher was set up so the co-commander could enter their recommendations ahead of the lead commander. They were told the recommendations would be presented on screen to prevent communication between the teammates and to avoid distracting each other. In actuality, the co-commander’s recommendations were programmed into the simulation to control the reliability and FAR.

Once separated, the participant read a script that described the mine detection task. Then they completed a 50-trial training block with feedback after each trial but without advice from the ATRs or the co-commander. A second eight-trial training block demonstrated how the advice would be displayed, and the process of indicating an initial impression and final assessment. Participants were advised that they could use the advice as much or as little as they would like to achieve the greatest performance.

Upon completing the training blocks, the three trust questionnaires were completed to gather a baseline measure of their trust in the ATR, trust in their co-commander, and confidence in their own ability to complete the mine detection task. Participants then completed three experimental blocks with advice from the ATRS and the co-commander. After each block they completed the three trust questionnaires (ATR, co-commander, and self-confidence). The task was self-paced; however, they were advised that each block should take no more than 15 minutes.

Data Analysis

Data were formatted in MATLAB and were analyzed with IBM SPSS Software (v26). Items on the trust and self-confidence questionnaires were reversed where necessary so increasing values were associated with greater trust or confidence. Scores were averaged by participant for each questionnaire and a 2 (Group: High FAR, Low FAR) by 4 (Block: baseline, B1, B2, B3) mixed ANOVA was completed for each with group as the between-subjects factor.

Results

All results were corrected for degree of departure from sphericity with Greenhouse-Geisser estimates. Figure error bars show the SEM.

Trust in ATRs

There was no main effect of FAR Group on trust in automation, F (1, 48) = .543, p = .465, η²p = .01. There was a significant main effect of Block, F (1.61, 77.33) = 9.614, p = .001, η²p = .17, which followed a negative linear trend, F (1, 48) = 13.566, p = .001, η²p = .01, indicating that, as participants completed the task, their trust in the aid decreased (Figure 2). There was no interaction between FAR Group and Block on reported trust in automation, F (1.61, 77.32) = .95, p = .374, η²p = .02, indicating that the FAR of the ATRS did not impact trust in the ATR.

Figure 2.

Reported trust in the automated target recognition system (ATRs). Blocks one to three were assessed following each experimental block.

Trust in Co-Commander

There was a significant main effect of FAR Group, F (1, 48) = 4.063, p = .049, η²p = .08, where the Low FAR Group had greater trust in their co-commander than the High FAR Group (Figure 3). There was no main effect of Block in the participants’ trust in their co-commander, F (2.61, 125.20) = 1.086, p = .352, η²p = .02 and no interaction between Group and Block, F (2.61, 125.20) = 1.845, p = .142, η²p = .04. Although the interaction was not significant, an independent samples t-test was conducted to confirm there was no difference in trust in the co-commander at baseline between the Low and High FAR Groups, t (48) = .636, p = .528, d = .18.

Figure 3.

Reported trust in the co-commander. Blocks one to three were assessed following each experimental block.

Self-Confidence

There was no main effect of FAR Group on self-confidence, F (1, 48) = 2.03, p = .161, η²p = .04. There was a main effect of Block, F (2.32, 111.28) = 14.57, p < .001, η²p = .23, but this was superseded by a significant interaction effect of Group and Block, F (2.32, 111.28) = 4.59, p = .009, η²p = .09. To follow up on the effects of FAR Group and Block, a one-way ANOVA with trend analysis over Block was completed for each Group. A significant effect of Block was found within the Low FA Group, F (2.22, 53.22) = 16.28, p < .001, η²p = .40, with a linear trend, F (1, 24) = 21.24, p < .001, η²p = .47, where self-confidence increased across Block (Figure 3). No effect of Block on self-confidence was found in the High FA Group, F (2.07, 49.59) = 1.64, p = .204, η²p = .06, (Figure 4).

Figure 4.

Reported self-confidence. Blocks one to three were assessed following each experimental block

Discussion

The present study examined the effects of a low and high FAR of an ATRS in a dual advisor mine detection task. While self-confidence was expected to increase across the blocks, it increased only in the low FAR group. Trust in the co-commander was expected to decrease across the blocks. Instead, it was higher in the low FAR group and remained steady across the blocks for both groups. As anticipated, trust in the ATRS decreased across the blocks for both groups; in line with the PAS, both misses and FAs eroded trust. While we expected that FAs would be salient and erode trust more rapidly in the high FAR group compared to the low FAR group, there was no effect of FAR group. Madhaven et al. (2006) reported a similar null effect of FAR on trust in an ATRS in a single advisor setting. However, it was surprising that the FAR rate of the ATRS influenced trust in the co-commander, and self-confidence but not trust in the ATR.

Trust in the ATRS may have remained the same between FAR groups due to the description of the system; participants were told that the ATRS compared objects in the images against a database of known mine characteristics and cued operators to objects with these mine-like characteristics. Although not intended to do so, this information could have acted as justification for why the ATRS may err while still performing as intended. Providing participants with rationale for errors has been noted to maintain trust and dependence following errors (Dzindolet et al., 2002) which might explain why salient FAs did not disproportionately impact trust in the ATRS here.

While FAs did not decrease trust in the system, participants in the low FAR (high miss rate) group may have located mines that were not detected by the ATRS and attributed this to their own abilities, bolstering self-confidence. Similarly, when the co-commander cued participants to areas not cued by the ATRS it may have appeared that the co-commander performed better than the ATR, increasing their trust in the co-commander without disproportionately influencing their trust in the ATR. In this circumstance, objects that were perceived as missed by the ATRS but detected by the co-commander may have resulted in salient ATRS misses in the low FAR group. Conversely, for the high FAR group, if FAs were assumed to be true hits the appearance that the expert system was detecting mines may have been simply interpreted as the co-commander and participant themselves performing as expected, resulting in steady trust and self-confidence throughout. Immediate feedback may have made the errors, and the manipulation, more obvious to the participants. However, if immediate confirmation was plausible in this task there would be little use for the operator or the ATR. Further work will examine dependence (i.e., whether the participant switched their initial impression to agree with one of the advisors), performance, and signal detection measures. To determine whether these findings are unique to the FAR of an ATR, future studies should manipulate the FAR of the co-commander.

As suggested by Merritt et al. (2015) concepts established in single-operator settings do not appear to extend to complex multi-human-automation settings. Additional research is required to understand how automated systems can be integrated into human teams.

Take Aways

The false alarm rate of an automated aid may impact the relative trust of other agents involved (i.e., a human-teammate or self-confidence) rather than trust in the system directly.

Concepts derived from human-automation research may not apply to multi-human automation teams.

Footnotes

ORCID iD

Grace Barnhart

References

Bliss

Dunn

Fuller

B. S.

(1995). Reversal of the cry-wolf effect: An investigation of two methods to increase alarm response rates. Perceptual and Motor Skills, 80(3, Pt 2), 1231–1242. https://doi.org/10.2466/pms.1995.80.3c.1231

Breznitz

(1983). Cry-wolf: The psychology of false alarms. Hillsdale, NJ: Lawrence Erlbaum

Culley

Madhavan

(2013). Trust in automation and automation designers: Implications for HCI and HMI. Computers in Human Behavior, 29(6), 2208-2210. https://doi.org/10.1016/j.chb.2013.04.032

Douer

Meyer

(2022). Judging One’s Own or Another Person’s Responsibility in Interactions with Automation. Human Factors, 64(2), 359-371. https://doi.org/10.1177/0018720820940516

Dzindolet

M. T.

Pierce

L. G.

Beck

H. P.

Dawe

L. A.

(2002). The perceived utility of human and automated AIDS in a visual detection task. Human Factors, 44(1), 79–94. https://doi.org/10.1518/0018720024494856

Hammond

T.R.

Midtgaard

Ø.

Connors

W.A.

(2021). A Bayesian Network Approach to Evaluating the Effectiveness of Modern Mine Hunting. remote sensing. 13(21), 4359. https://doi.org/10.3390/rs13214359

Pavlovic

N. J.

Arrabito

Abdalla

(2011). Human Factors Issues When Operating Unmanned Underwater Vehicles. (Technical Report DRDC Toronto TM 2011-100). Toronto, ON: Defence R&D Canada

Hoff

K. A.

Bashir

(2015). Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors. 57(3), 407-434. https://doi.org/10.1177/0018720814547570

Jian

Bisantz

Drury

(2000). Foundations for an Empirically Determined Scale of Trust in Automated Systems. International Journal of Cognitive Ergonomics, 4(1), 53-71. https://doi.org/10.1207/S15327566IJCE0401_04

10.

Knocton

Hunter

Connors

Dithurbide

Neyedli

(2021). The Effect of Informing Participants of the Response Bias of an Automated Target Recognition System on Trust and Reliance Behavior. Human Factors. 65(2), 189-199. https://doi.org/10.1177/00187208211021711

11.

Lee

See

(2004). Trust in Automation: Designing for Appropriate Reliance. Human Factors, 46(1), 50-80. https://doi.org/10.1518/hfes.46.1.50_30392

12.

Madhavan

Wiegmann

D. A.

Lacson

F. C.

(2006). Automation Failures on Tasks Easily Performed by Operators Undermine Trust in Automated Aids. Human Factors, 48(2), 241–256. https://doi.org/10.1518/001872006777724408

13.

Madhaven

Wiegmann

D. A.

(2007). Similarities and differences between human-human and human-automation trust: an integrative review. Theoretical Issues in Ergonomics Science, 8(4), 277-301. https://doi.org/10.1080/14639220500337708

14.

Manzey

Gérard

Wiczorek

(2014). Decision-making and response strategies in interaction with alarms: The impact of alarm reliability, availability of alarm validity information and workload. Ergonomics, 57(12), 1833-1855. https://doi.org/10.1080/00140139.2014.957732

15.

Merritt

Huber

Lachapell-Unnerstall Lee

(2014). Continuous Calibration of Trust in Automated Systems. Retrieved February 24, 2023, from https://apps.dtic.mil/dtic/tr/fulltext/u2/a606748.pdf

16.

Merritt

S.M.

(2011). Affective processes in human-automation interactions. Human Factors, 53(4), 356-370. https://doi.org/10.1177/001872081141191

17.

Merritt

S.M.

Sinha

Curran

P.G.

Ilgen

D.R.

(2015). Attitudinal predictors of relative reliance on human vs. automated advisors. Human Factor, 3(3-4), 327-345. https://doi.org/10.1504/IJHFE.2015.072982

18.

Reiner

Hollands

Jamieson

Salas

Marquez

Gore

(2017). Target Detection and Identification Performance Using an Automatic Target Detection System. Human Factors, 59(2), 242-258. https://doi.org/10.1177/0018720816670768

19.

Wickens

C. D.

Helton

W. S.

Hollands

J. G.

Banbury

(2021). Engineering psychology and human performance (5th ed.). Rutledege.

20.

Wickens

C. D.

Dixon

S. R.

(2007). The benefits of imperfect diagnostic automation: A synthesis of the literature. Theoretical Issues in Ergonomics Science, 8(3), 2-1-212. https://doi.org/10.1080/14639220500370105

21.

Williams

K. J.

Yuh

M. S.

Jain

(2023). A Computational Model of Coupled Human Trust and Self-Confidence Dynamics. ACM Transactions on Human-Robot Interaction. https://doi.org/10.1145/3594715

Interpersonal and Human-Automation Trust in an Underwater Mine Detection Task

Abstract

Keywords

Background

Trust and Self-Confidence

False Alarm Rate

Dual Advisor Scenarios

Purpose and Hypotheses

Methods

Participants

Design

Measures

Procedure

Data Analysis

Results

Trust in ATRs

Trust in Co-Commander

Self-Confidence

Discussion

Take Aways

Footnotes

ORCID iD

References