Sage Journals: Discover world-class research

Abstract

Trust development will play a critical role in remote vehicle operations transitioning from automated (e.g., requiring human oversight) to autonomous systems. Factors that affect trust development were collected during a high-fidelity remote uncrewed aerial system (UAS) simulation. Six UAS operators participated in this study, which consisted of 17 trials across two days per participant. Trust in two highly automated systems were measured pre- and post-study. Perceived risk and familiarity with the systems were measured before the study. Main effects showed performance-based trust and purpose-based trust increased between the pre- and post-study measurements. System familiarity predicted process-based trust. An interaction indicated that operators who rated the systems as riskier showed an increase in a single-item trust scale between the pre- and post-study measurement, whereas participants that rated the systems as less risky maintained a higher trust rating. Individual differences showed operators adapted to why the automation was being used, and trust improved between measurements. Qualitative analysis of open-ended responses revealed themes related to behavioral responses of the aircraft and transparency issues with the automated systems. Results can be used to support training interventions and design recommendations for appropriate trust in increasingly autonomous remote operations, as well as guide future research.

Keywords

human-automation trust human-automation interaction virtual environments perceived risk system familiarity uncrewed aerial system mixed-effect modeling qualitative analysis

The Advanced Air Mobility (AAM) concept envisions a diverse set of emerging aerial technologies. These technologies will enable varying mission types in urban and rural environments, with applications ranging from commercial transport and air taxi services (e.g., Urban Air Mobility) to drone surveillance and inspection operations (National Academies of Sciences, Engineering, and Medicine, 2020). To promote the scalability potential for AAM applications, emerging aviation markets are exploring remote vehicle operations that allow fewer human operators (m) to manage more vehicles (N), referred to as m:N (Aubuchon et al., 2022). These types of operations will be supported by increasingly autonomous systems, which will shift authority toward the technology (Pritchett et al., 2018). The term “increasingly autonomous” characterizes technology that spans “the spectrum of system capabilities that begin with the abilities of current automatic systems, such as autopiloted and remotely piloted (non-autonomous) unmanned aircraft, and progress toward the highly sophisticated systems that would be needed to enable the extreme cases” (National Research Council, 2014, p. 2; cf. “increasingly capable automation,” Chiou & Lee, 2023). Because this term encompasses both automation and autonomous systems, providing a distinction between these two classifications is helpful.

We adopt a definition of automation as “a device or system that accomplishes (partially or fully) a function that was previously, or conceivably could be, carried out (partially or fully) by a human operator” (Parasuraman et al., 2000, p. 287). This definition encompasses the notion of level of automation (LOA), which can range from fully manual (i.e., the computer offers no assistance, and the human must take all decisions and actions) to fully automated (the computer decides everything, acts autonomously, ignoring the human; cf. Sheridan & Verplank, 1978). Surpassing automation, Kaber (2018, p. 408) describes an autonomous system (agent) as requiring three characteristics: (1) an autonomous agent is viable in a target context; (2) an autonomous agent possesses independence or capacity for function/performance without assistance from other agents (e.g., humans); and (3) an autonomous agent possesses self-governance in goal formation and fulfillment of roles. From this perspective, even if a system has a high LOA, if it requires monitoring and the possibility of human intervention in a particular operating context, then it is not independent and therefore not an autonomous system (“Tenet 2” outlined by Kaber, 2018). Instead, the technology is automation regardless of technical capacity to act autonomously.

Viable, mature AAM applications will entail a paradigm shift where operations are accomplished autonomously without human intervention or oversight (see Goodrich & Theodore, 2021). Yet, despite the growing interest in developing and implementing autonomous systems, regulators and users often constrain or under-use potentially autonomous capacities because of a lack of trust in systems to operate with greater responsibility and authority over the execution of safety-critical tasks (Kaber, 2018). Instead, although systems may be highly automated, human oversight is often employed to provide a layer of resiliency in case of system failure (cf. Holbrook et al., 2019). Understanding the developmental process of trust in increasingly autonomous technologies will play a critical role in charting the path from remote vehicle operations that leverage automation (e.g., requiring human oversight) to scalable autonomous operations. The objective of the current study was to examine factors that affect the development of operator trust in highly automated (increasingly autonomous) systems across a two-day, high-fidelity remote uncrewed aerial system (UAS) simulation. To frame this objective, however, we provide a brief overview of the developmental process of human-automation trust.

Human-Automation Trust Development

Over the last several decades researchers have used trust to predict and describe behaviors and intentions toward adopting, using, and interacting with automation (e.g., Chancey et al., 2015, 2017; Chiou & Lee, 2023; Hoff & Bashir, 2015; Lee & See, 2004; Muir, 1987; Sheridan, 1988, 2019a, 2019; Sheridan et al., 1983; Sheridan & Hennessy, 1984; Sheridan & Verplank, 1978). Moreover, as with human teams (Salas et al., 2005), trust in human-autonomy teams is a critical coordinating mechanism as well (de Visser et al., 2020; O’Neil et al., 2022; Schaefer et al., 2019). Although appropriate trust facilitates efficient interactive behaviors with automation, if trust exceeds or falls short of system capabilities, then misuse or disuse, respectively, may occur (Lee & See, 2004). Disuse refers to failures resulting from an operator rejecting the capabilities of the automation and disabling, ignoring, or spending excessive time crosschecking the actions and decisions of the technology. Alternatively, misuse refers to failures resulting from an operator inadvertently violating critical assumptions and not monitoring the automation enough or depending on the automation when it should not be used (see Parasuraman & Riley, 1997). Thus, to provide clarity, a definition of trust is required, as it is often identified as the operant variable in disparate human-automation interaction and human-autonomy teaming paradigms.

Lee and See (2004) highlight two important components of the construct. First, the trustee is responsible for advancing the goal(s) of the trustor, in which the trustee performs a particular action on their behalf. Second, the trustor must willingly assume some risk by delegating responsibility to the trustee, which introduces an aspect of risk perception and vulnerability on the part of the trustor (Stuck et al., 2022; cf. Mayer et al., 1995). If an operator does not perceive the risk associated with placing a technology in charge of achieving their goal(s), then trust will not greatly affect intentions (Chancey, 2020) or behaviors (Chancey et al., 2017; see Stuck et al., 2022 for overview of risk in the context of human-automation trust). Reflecting these perspectives, trust is defined as “an attitude that an agent will help achieve an individual’s goals in a situation characterized by uncertainty and vulnerability” (Lee & See, 2004, p. 51).

Researchers often denote goal-oriented information that supports trust along a dimension of attributional abstraction, which ranges from being based on observable behaviors of the trustee to being based on more abstract concepts in reference to the trustee. Similarly, Lee and See (2004; Lee & Moray, 1992) proposed goal-oriented informational bases for trust in automation: performance, process, and purpose.

Performance-based trust describes what the automation does. For this component, observable automation actions that reliably achieve the operator’s goals will lead to greater trust. To illustrate, a remote operator’s trust in an autopilot system will increase in proportion to the successful observed, experienced, and reported flights that are safely completed with little or no unanticipated operator interventions. Yet if the automation fails to execute the task it was designed for, performance-based trust can significantly decrease (see Politowicz et al., 2021, for UAS example).

Process-based trust describes how the automation operates and corresponds to the appropriateness of the automation’s algorithms in achieving the operator’s goals. For this component, automation that is understandable will lead to appropriate trust, which can occur through transparent design (Chen et al., 2014; Lyons, 2013) and/or providing a global explanation for how the technology works (Klein et al., 2021), possibly through training (Cohen et al., 1998). To illustrate, a remote operator is more likely to trust detect-and-avoid automation if they understand (via training or transparency) the general logic of that system causes aircraft to circle back around to missed waypoints after resolving a conflict.

Purpose-based trust describes why the automation was developed and corresponds to how well the designer’s intent has been communicated to the operator. This closely resembles Rempel et al.’s (1985) concept of faith-based trust, where trust is based on the belief that the automation can be depended upon in the absence of behavioral observation of the trustee. For this component, automation that achieves the goals the operator understands it was designed to achieve will lead to appropriate trust. To illustrate, a remote operator is more likely to appropriately trust an autopilot that executes an unplanned course correction if they understand that system is designed to avoid conflicts when navigating between waypoints and is simply deconflicting flight paths with another vehicle before it reengages with the path.

Lee and See (2004) note that all three of these trust bases can mutually affect the development of each other (see Muir, 1987; Li et al., 2019, for recognition of this with similar trust bases). Drawing from the autopilot examples, if the operator observes aircraft avoiding other vehicles during an operation (performance-based), then this could update the operator’s understanding of the autopilot to include detect-and-avoid functionality (purpose-based). Similarly, the operator could have learned in training that the autopilot has detect-and-avoid functionality (purpose-based), which could provide some understanding for how it works (process-based) and be confirmed during operations (performance-based). When trust is based on multiple mutually supportive and confirming dimensions it can be robust, yet when it is based on a single dimension it can be fragile (Lee & See, 2004).

Lee and See (2004) proposed that although these bases of trust are largely influenced by affective processes (i.e., moods, feelings, and emotions), analytical and analogical processes also determine the assimilation of this information. Analytically, trust reflects accumulated knowledge from previous interactions, which are used to evaluate the behavior of the automation rationally and probabilistically. This process, however, may overemphasize the cognitive capability of the decision maker to effectively engage in conscious calculations or exhaustively compare alternatives. Compounding this effect, the nondeterministic algorithms leveraged by increasingly autonomous systems may not allow the operator insight into system behaviors. Analytical processes, therefore, are likely complemented by less cognitively demanding methods such as analogical judgments, which can develop through direct or indirect observations and assumptions based on existing category memberships. Lee and See (2004) propose, however, that affective processes largely influence the effect of trust on behavior, because trust is not only thought about but also felt (Fine & Holyfield, 1996). When a system does not meet expectations of achieving an important goal, then trust is violated, and emotions signal a posture change toward that system is needed. Specifically, emotions guide behaviors when rules do not apply or when cognitive resources are not available to make a rational choice.

Because trust is based on goal-oriented information assimilated via affective, analogical, and analytical paths, this implies it is not static. To this point, recent research has indicated the importance of considering trust as a dynamic construct that can change over time (e.g., de Visser et al., 2020; Guo & Yang, 2021; Yang, Guo, & Schemanske, 2023, Yang, Schemanske, & Searle, 2023). Hoff and Bashir (2015) describe learned trust, which is based on current and past interactions with the target automation and is directly influenced by preexisting knowledge and current performance of the system. Learned trust is separated into two types. Initial learned trust, which represents trust prior to interacting with a system, is based on preexisting knowledge that can be derived from expectations, reputation (e.g., brand and hearsay), experience with the system or similar system, and general understanding of that system (i.e., system familiarity; cf. Bliss et al., 1995). Indeed, Tenhundfeld et al. (2019) concluded that familiarity is essential to trust development, showing that demonstrating the autopark feature in a Tesla Model X automobile led to fewer interventions than simply providing information about the feature. The authors stated this result was consistent with the learned trust concept. Initial learned trust determines the early interaction and dependence strategy that an operator will adopt and is likely supported by purpose-based and process-based information (i.e., existing information the operator has about how the automation works and why it was developed). Dynamic learned trust, however, represents trust formation during an interaction with the automation, which can be impacted by system reliability, predictability, and errors. Because this trust develops during interactions, behaviors of the system are observed and it is more likely supported by performance-based information (i.e., what the automation is doing).

Theoretically, operator trust in systems that are highly automated (or may possess the capacity to act autonomously) should develop as a function of preexisting knowledge and exposure to those systems. It is unclear, however, if these theoretical predictions generalize to expert operators managing increasingly autonomous UAS. Understanding that developmental process will be important to chart how and when increasingly autonomous technologies could be leveraged to support scalable remote vehicle operations.

Purpose and Hypotheses

Human-automation trust has been studied extensively in controlled laboratory settings (though not exclusively, yet see studies reviewed in Hoff & Bashir, 2015, and meta-analysis in Schaefer et al., 2016), which has been useful to evaluate theoretical predictions such as those outlined in Lee and See (2004). In controlled laboratory settings, the experimental task is simplified by omitting factors present in operational environments to maximize internal validity (i.e., concerns whether the relationship between two variables is causal in nature; see Shadish et al., 2002). Yet abstracting away complexities and controlling for extraneous variables to isolate a causal effect raises doubts about the extent to which that relationship generalizes beyond a specific experiment (i.e., concerns of external validity; Shadish et al., 2002), particularly when settings and participants do not resemble the application from which the effect is being studied (i.e., concerns of ecological validity; see Chancey et al., 2023). Evaluations conducted in high-fidelity simulators sampling from an expert population can help determine whether results obtained under controlled laboratory settings are able to overcome the numerous additional factors not addressed or held constant (Vicente, 1997).

Therefore, the purpose of the current study was to examine the factors that affect expert operator trust in increasingly autonomous systems during a high-fidelity remote UAS simulation. We broadly use the term “high-fidelity” because the testing environment and equipment are also used to remotely command and control live vehicles (see Liu et al., 2009, for description of simulation fidelity). Indeed, although not the focus of the results presented in the current work, a parallel goal of this simulation activity was to support integration, testing, and safety risk assessment in a simulated environment prior to live flight operations (see Glaab et al., 2022). Hence, this simulation activity provided an opportunity to investigate operator trust toward increasingly autonomous technologies in an ecologically representative UAS environment.

In the current study, trained and experienced ground control station operator (GCSO) participants directed simulated UAS to complete multiple waypoint-following scenarios designed to exercise and stress-test onboard automated systems (i.e., a parallel hardware-in-the-loop study was also conducted). Participants rated their trust toward two highly automated systems before and after the study. Based on the review of human-automation trust development, we proposed three hypotheses. It should be noted, we did not experimentally manipulate the actual or perceived reliability of the target systems. Therefore, we did not hypothesize directionality in trust outcome measures.

Hypothesis 1

Performance-based trust will change between the pre- and post-measurement times, because it is based on observing the behaviors and actions of automated systems (Lee & See, 2004; Politowicz et al., 2021).

Hypothesis 2

Based on the concept of initial learned trust, operator familiarity and previous times flown with these systems will predict process- and purpose-based trust (Hoff & Bashir, 2015; Tenhundfeld et al., 2019).

Hypothesis 3

Perceived risk of the automated systems will predict subjective trust ratings generally (Lee & See, 2004; Stuck et al., 2022).

Method

Participants

Six male UAS GCSO personnel (M_age = 37.33, SD_age = 9.27) participated in this study, which included NASA civil servants and contractor employees. Qualified GCSO participants had to have received Crew Resource Management (CRM) training, training for supporting a safety pilot (i.e., the pilot in command located on the flight range), and have supported live UAS operations as a GCSO prior to the current study. Participants self-reported an average of 5.75 hours of video game use per week (SD = 7.57; Min = 0, Max = 20) and 39.67 hours of computer use per week (SD = 19.61; Min = 8, Max = 60). This research complied with the American Psychological Association Code of Ethics and was approved by NASA’s Institutional Review Board. Informed consent was obtained from each participant.

GCSO Role and Apparatuses

GCSO Role

The GCSO is responsible for all preflight, inflight, and postflight activities associated with the communication to an aircraft. The GCSO manages the operation of the aircraft by means of a ground control station (GCS), via a computer interface with an onboard flight management system through a command and control communications link (see LPR 1710.16 5.1.5.4: https://nodis3.gsfc.nasa.gov/npg_img/N_PR_7900_003D_/N_PR_7900_003D__Chapter5.pdf). As a GCSO, the participant was expected to command and control a simulated small UAS, multi-rotor vehicle along a predefined flight path using GCS software. The GCSO could command actions of Takeoff (vehicle will gain altitude from a ground location), Hold (pauses aircraft along the flight path), Return to Launch (vehicle automatically returns to the takeoff location), Land (vehicle descends in altitude until it lands), and Load/Modify new flight plan missions. These actions were commanded while monitoring the health and position status of the vehicle and making standard callouts (e.g., “proceeding to waypoint 5”). Using onboard automated systems, the GCSO could also take actions to perform automated detect-and-avoid maneuvers and execute emergency or contingency landings (see Automated Systems section below).

Remote Operations for Autonomous Missions (ROAM) UAS Operations Center and GCSO Workstation

The study was conducted in the ROAM UAS Operations Center located at NASA Langley Research Center (see Buck et al., 2023). In addition to individual workstations, ROAM is equipped with a large-format video wall at the front of the room (Figure 1). The GCSO workstation was the primary information and interaction location for the GCSO, and consisted of a Dell Precision 7820 Tower, two Dell P2721Q 27” 16:9 4K USB-C Monitors (both portrait presentation), a Dell P2418HT 24” 16:9 10-Point Touch Screen IPS Monitor (tilted at 45° angle), QWERTY keyboard, mouse, phone, and a Tobii Pro Nano eye tracker that was mounted to the touchscreen (Figure 2). The touchscreen was the primary GCSO display and presented the Measuring Performance for Autonomy Teaming with Humans (MPATH) GCS interface (see Politowicz et al., 2023). The GCSOs interacted with MPATH via mouse (note: the touch functionality option was not used by participants).

Figure 1.

ROAM UAS operations center layout.

Figure 2.

GCSO workstation layout.

Automated Systems

Two onboard automated systems served as the targets of the human-automation trust questionnaire: Independent Configurable Architecture for Reliable Operations of Unmanned Systems (ICAROUS) and Safe2Ditch. The ICAROUS system is an architecture with a detect-and-avoid capability that enables automated mid-flight reroutes, supporting automated responses to potential traffic incursions and geofence (i.e., pre-established virtual boundaries) breaches (see Consiglio et al., 2016). The Safe2Ditch system is a contingency management tool that contains a pre-loaded (and updatable) set of “ditch site” locations with the position, size, and reliability of each site (see Glaab et al., 2018). During an off-nominal event Safe2Ditch can be activated from MPATH, where Safe2Ditch identifies the location of an appropriate ditch site and then communicates (via ICAROUS) to the autopilot to land the vehicle at the selected ditch site. Although Safe2Ditch was technically integrated with ICAROUS, from a user-perspective the detect-and-avoid and contingency management functions were represented to the participants as independent functions performed by separate automated systems (i.e., ICAROUS and Safe2Ditch, respectively) and each system was represented as two separate user interface elements on the MPATH display. Moreover, participants were not made explicitly aware of the underlying software architecture connecting ICAROUS and Safe2Ditch.

Design

We employed a 2 (System: ICAROUS, Safe2Ditch) × 2 (Time: Pre-Study, Post-Study) within-subjects design, where ICAROUS and Safe2Ditch were the target systems of the trust questionnaires and Pre- versus Post-Study indicated the temporal order of when the trust questionnaires were administered. Trust was the only dependent/outcome variable analyzed in the current study and was measured using the human-automation trust questionnaire reported in Chancey et al. (2017; see online appendix for full questionnaire). The questionnaire is compatible with the Lee and See (2004) theoretical perspective and consists of three factors (five items each) measuring performance-based trust (Performance), process-based trust (Process), and purpose-based trust (Purpose), which has been empirically established using a confirmatory factor analysis approach (Yamani et al., in press). Composite trust was the average of all 15 items. Note that the trust questionnaire for ICAROUS focused on only the collision avoidance functionality and did not reference the geofence avoidance capability. We also included a general trust measure (General Trust), which was a single-item scale for each system (e.g., “I trust Safe2Ditch to safely land my vehicle during an emergency”). The perceived risk of the system (Risk), familiarity of the system (Familiarity), and times flown with the system (Times Flown) were also measured before the study. Risk was measured using a five-item scale developed by Clothier et al. (2015), which targeted both ICAROUS and Safe2Ditch in two separate questionnaires (cf. perceived relational risk outlined by Stuck et al., 2022). Familiarity was measured using a one-item scale that asked participants to rate their familiarity for both ICAROUS and Safe2Ditch with two separate scales (i.e., “How familiar are you with Safe2Ditch/ICAROUS?” 1 = Not at All Familiar, 7 = Extremely Familiar). Times Flown was measured by asking participants to indicate how many times they have used ICAROUS and Safe2Ditch (i.e., “Approximately how many times have you flown a vehicle with Safe2Ditch/ICAROUS on board and engaged [provide a single number]?”: note, these are two separate questions for each target automation). Following the completion of each scenario, participants were given the opportunity to optionally provide any additional comments via open-ended responses on an iPad tablet.

Procedure

We tested participants individually and each session occurred across two business days (approximately 16 hours per participant). Participants received a 1-hour lunch break and three 15-minute breaks each day. Upon arrival, we asked the participant to read and sign an informed consent form, Privacy Act notice, and a demographics and background information form. The participant then completed a series of pre-study questionnaires: Trust (Performance, Process, Purpose, General Trust), Risk, Familiarity, and Times Flown. A 1-hour familiarization session of ROAM, the GCSO workstation, and the study procedures followed these questionnaires. The participant was then asked to act as the GCSO across nine simulated UAS scenarios.

Participants completed scenarios 1–3 once, scenarios 4–8 twice, and scenario 9 four times, totaling 17 trials completed across the two days (Table 1). All participants experienced scenario 1 first (i.e., vehicle control test), but the order of the remaining scenarios was different among participants across the remaining trials. We note here, although the order of the scenarios was different for each participant, all participants experienced four exposures to the target automation on one day and then five exposures to the target automation on the other day. Trials took approximately 20 minutes, and there was additional variable downtime between trials to reset systems. In total, participants interacted with the target automated systems in nine trials (five scenarios) and without the target systems in eight trials (four scenarios). Scenarios that did not have the target automated systems were included to support testing and safety risk assessments for follow-on live flight operations. Following each trial completion, the participant was offered the opportunity to complete the open-ended response. At the conclusion of all trials across both days, the participant completed the post-study human-automation trust questionnaires. At the end of the second day the participant was thanked, which concluded their participation in the study.

Table 1.

Scenario Descriptions.

#	Scenario Name	Description	Main Systems in Use	Exposure
1	Vehicle control test	This scenario tested the vehicle’s ability to takeoff and begin automated flight, change modes, and execute a return to launch command. It also tested a GCSO’s ability to create a geofence and flight plan.	No target automation	1
2	Geofence test	This scenario tested a vehicle encounter with a geofence and allowed ICAROUS to provide a new flight trajectory to avoid the geofence and then return to the original mission.	ICAROUS	1
3	Emergency descent	This scenario tested vertical descents with no horizontal movement. Vehicle altitude reduced at a max descent rate of 1 m per second from 375 feet (114.3 m) to safely hold altitude at 175 feet (53.34 m).	No target automation	1
4	Nominal flight path	This scenario tested a nominal flight executed using a predetermined flight plan.	No target automation	2
5	Flight path deviation – (low)	This scenario tested a low threat traffic incursion that required ICAROUS to activate and reroute. Low threat refers to an intruder aircraft on an approach for a nearby landing location.	ICAROUS	2
6	Flight path deviation – (high)	This scenario tested a high-threat traffic incursion that required ICAROUS to activate and reroute. High threat refers to an intruder aircraft on an approach for the same landing location. The traffic incursion required a larger deviation from the original flight path than the low scenario.	ICAROUS	2
7	Landing deviation – (low)	This scenario tested a manually triggered low threat flight path deviation due to vehicle health issues. Low threat refers to the off-nominal condition timing being right after the vehicle finishes the departure leg of the flight path. The expected landing location is decided by Safe2Ditch.	Safe2Ditch ICAROUS	2
8	Landing deviation – (high)	This scenario tested a manually triggered high-threat flight path deviation due to vehicle health issues. High threat refers to the off-nominal condition timing being right before the vehicle starts the arrival leg of the flight path, one of the furthest points from original takeoff location. The expected landing location decided by Safe2Ditch cannot be assumed with three landing locations in approximately equal proximity.	Safe2Ditch ICAROUS	2
9	Flight path & landing deviation	This scenario simulated a landing site closure, where a GCSO required a new flight plan from the fleet manager mid-flight.	No target automation	4

Results

Table 2 provides descriptive statistics. We analyzed each of the trust factors separately using mixed-effect modeling, which is a flexible regression approach for modeling data with repeated measures that have both fixed and random effects. Equation (1) shows the terms of the full model applied to each trust factor, f.

{T r u s t}_{f} \sim T i m e + T o o l + F a m i l i a r i t y + R i s k + T i m e : T o o l + F a m i l i a r i t y : T i m e + R i s k : T i m e + (1 | I D)

(1)

Table 2.

Descriptive Statistics for trust Factors, Familiarity, Times Flown, and Risk Across System and Time.

System	Time	Gen.	Perf.	Pro.	Purp.	Comp.	Fam.	TF	Risk
ICAROUS	Pre-study	2.83 (1.47)	3.00 (1.25)	2.57 (1.38)	2.90 (1.16)	2.82 (1.02)	4.00 (2.37)	28.33 (23.17)	3.83 (.41)
ICAROUS	Post-study	4.17 (0.75)	4.47 (0.93)	3.63 (1.99)	3.63 (1.31)	3.91 (1.13)
Safe2Ditch	Pre-study	3.50 (1.88)	3.20 (2.51)	3.20 (1.48)	2.77 (1.52)	3.06 (1.47)	4.17 (2.64)	28.33 (23.17)	4.17 (0.94)
Safe2Ditch	Post-study	5.17 (0.75)	4.97 (0.95)	3.87 (1.72)	4.03 (1.29)	4.29 (1.04)

Note. SD in parentheses. All responses except for Times Flown are on a scale of 1 (low) to 7 (high). Gen. = General Trust, Perf. = Performance-Based Trust, Pro. = Process-Based Trust, Purp. = Purpose-Based Trust, Comp. = Composite Trust, Fam. = Familiarity, TF = Times Flown.

Participant was the only random effect, denoted by (1|ID); all other tested effects were fixed. Mixed-effect modeling allows for explicit modeling of correlated errors that can arise from repeated measures, a better handling of missing or unbalanced data compared to analysis of variance, and extensions to non-normally distributed outcomes if appropriate (Seltman, 2012; Singer, 1998). Familiarity was highly correlated with Times Flown for both systems (ICAROUS, r = .949, p < .001; Safe2Ditch, r = .987, p < .001), indicating these measures were redundant. Therefore, only Familiarity and Risk were included in equation (1).

We treated the trust factor data as continuous rather than ordinal because they are averages of Likert items. Norman (2010) has shown that parametric analysis is robust to skewness and non-normality that may be encountered from this type of data. Additionally, small sample size is not strictly a statistical issue given the robustness (Norman, 2010). However, to address potential issues due to the small sample of six GCSOs, we checked model assumptions and calculated effect sizes (partial η²). We consulted residual plots of the models to check for assumptions of linearity, constant variance, and independent and normally distributed errors.

After applying the full mixed model and checking assumptions, we used a stepwise procedure for variable selection by testing the random effect and the marginal fixed-effect terms until only significant terms remained. Without a random effect for the participant, meaning there is no inherent difference among the participants, the models revert to standard linear regression. The statistical significance threshold (α) was set to p ≤ .01 to compensate for multiple testing (i.e., .05/5 = 0.01; Bonferroni method). We used R Statistical Software to model the data (v4.1.2; https://www.R-project.org/) using the lme4 package (Bates et al., 2015) and the ggplot2 package (Wickham, 2016) to create the plots.

Bayes factors (BF) were also calculated to supplement null-hypothesis significance testing. BFs were approximated using Bayesian Information Criterion (BIC) calculated for models with and without variables of interest. Equation (2) shows the relationship between BF and BIC for a given null and alternative hypothesis, for example, a model of trust based only on Time compared to a model without Time (Wagenmakers, 2007).

B F = e^{({BIC}_{H 1} - {BIC}_{H 0}) / 2}

(2)

We adopted the terms of Nuijten et al. (2015) to describe each BF, or the strength of evidence supporting a particular model: BF greater than 100 is extreme evidence for the model, BF 30–100 is very strong evidence for the model, BF 10–30 is strong evidence for the model, BF 3–10 is moderate evidence for the model, BF 1–3 is anecdotal evidence for the model, 1 is no evidence for or against the model, and BF less than 1 indicates evidence for the reduced model.

Composite Trust

No predictor or independent variables were found to have statistically significant relationships with composite trust score.

Performance-Based Trust

There was a significant main effect of Time on Performance, F (1, 22) = 12.01, p = .002, partial η² = .35, where ratings were significantly higher Post-Study (M = 4.72, SD = .93) than Pre-Study (M = 3.10, SD = 1.32; see Figure 3). Confirming this, a BF of 38.08 indicated very strong evidence for the main effect of Time on Performance. Risk and Familiarity did not predict Performance, nor was a significant effect of System on Performance observed.

Figure 3.

Box-plots for time on performance-based trust.

Process-Based Trust

Familiarity significantly predicted Process, F (1, 22) = 33.25, p < .001, partial η² = .60. Confirming this, a BF of 12,843.44 indicated extreme evidence for the effect of Familiarity on Process. Figure 4 plots the linear relationship between Process and Familiarity with a 95% confidence interval around the regression line. Specifically, an increase in Familiarity rating by one-point leads to an expected increase in Process by 0.62. Risk was not a significant predictor of Process, nor was there a significant effect of System or Time on Process.

Figure 4.

Positive main effect of familiarity on process-based trust. Note. Linear regression line shown with 95% confidence interval.

Purpose-Based Trust

There was a significant main effect of Time on Purpose, F (1, 17) = 9.38, p = .01, partial η² = .36 (BF = 8.58, moderate evidence for this effect). Yet a significant random effect of participant indicated the GCSOs had varying Purpose ratings at the start (BF = 27.28, strong evidence for this effect). Specifically, Purpose ratings started at different points, but that rating increased after the study. Figure 5 shows the Purpose rating changes from Pre-Study to Post-Study across participants. Note that System is differentiated for plotting but are not significantly different.

Figure 5.

Change in purpose-based trust over time varying by participant.

General Trust

A significant interaction between Risk and Time on General Trust was observed, F (1, 19) = 14.30, p = .001, partial η² = .43. For participants that gave higher Risk ratings, General Trust ratings before participating in the study were lower and increased after the study. For participants that gave lower Risk ratings, General Trust rating tended to be higher both before and after the study (Figure 6). Confirming this, a BF of 171.24 indicated extreme evidence for the Risk × Time interaction. Additionally, a significant main effect of System on General Trust was also observed, F (1, 19) = 14.30, p = .004, partial η² = .37, where trust in ICAROUS (M = 3.50, SD = 1.31) was significantly less than trust in Safe2Ditch (M = 4.33, SD = 1.61). Confirming this, a BF of 49.50 indicated very strong evidence for the main effect of the System on General Trust. Familiarity did not significantly predict General Trust ratings.

Figure 6.

General trust is impacted by risk differently for pre-study and post-study and differs by system. Note. Shaded bounds represent 95% confidence intervals.

Qualitative Results

Following a similar approach outlined by Carmody et al. (2022; cf. Braun & Clarke, 2006), we conducted a qualitative analysis on the open-ended responses participants provided at the end of each trial to contextualize the quantitative trust ratings. Of the scenarios using the target systems (i.e., scenarios 2, 5, 6, 7, and 8), all participants provided a response following the completion of at least one scenario except for Participant 3. Participant 1 provided a single response after one of these scenarios, yet the response did not relate to either of the target systems: “Time mission start lead me to miss some steps on the test card.” Therefore, qualitative data were generated from Participant 2 (9 responses), Participant 4 (4 responses), Participant 5 (6 responses), and Participant 6 (9 responses). Of the total 28 responses, the average response length was 34.18 words (SD = 26, Min = 6, Max = 107) as counted by the word count functionality in Microsoft Word. Two researchers reviewed the 28 responses, generated codes, and then interactively worked together to compare and consolidate the initial coded data into overarching themes. This resulted in two categories of themes labeled as Operational Themes and Automation Themes.

Operational Themes

Three Operational Themes were generated (Table 3), which were organized by responses on the successfulness of the operation (e.g., “Looked like a good flight operation”), suggested operation improvements (e.g., “If no SP [safety pilot] then need an instant RTL [Return to Launch] or fail-safe action for GCSO to initiate an action fast”), and workstation and display issues (e.g., “I would also prefer the information bar at the bottom to be located at the top I think it is easier to view that way”). Yet the focus of this analysis was to provide qualitative context to the trust ratings, so we do not provide an in-depth analysis of the Operational Themes in the current paper. Responses to scenarios 1, 3, 4, and 9 were all organized into only Operational Themes, which makes sense as neither target automation was used in those scenarios.

Table 3.

Frequency of Participant Responses Related to Operational Themes.

Themes	Scenario 2 Geofence Test: ICAROUS	Scenario 5 Flight Path Deviation (Low): ICAROUS	Scenario 6 Flight Path Deviation (High): ICAROUS	Scenario 7 Landing Deviation (Low) Safe2Ditch and ICAROUS	Scenario 8 Landing Deviation (High) Safe2Ditch and ICAROUS	Total Responses/Theme
Successful operation	1 (P6)	2 (P2, P6)			1 (P4)	4
Suggested operational improvements			1 (P2)	1 (P2)	2 (P2²)	4
Workstation Display issues		4 (P2², P5, P6)	1 (P6)		2 (P6²)	7
Total responses/Scenario	1	6	2	1	5	Grand total = 15

Note. Participant identification number in parentheses (e.g., P6 = Participant 6). Superscript number indicates how many times the participant’s comment was organized into a theme.

Automation Themes

Automation Themes were organized into three categories, which were ICAROUS Behaviors, ICAROUS Transparency, and Safe2Ditch Transparency (Table 4). Of the total 18 responses that were categorized into either a theme related to ICAROUS or Safe2Ditch, only one response mentioned both ICAROUS and Safe2Ditch in the same response: “I didn’t realize ICAROUS and S2D [Safe2Ditch] were interconnected until this run” (note this response is not included in Table 4). Indeed, participants were not explicitly told that Safe2Ditch works by communicating with ICAROUS. It is reasonable to assume that participants likely treated these as two functionally separate systems, as the responses neatly coded into scenarios that included an automated detect-and-avoid event (primarily associated with ICAROUS) and a contingency management/automated landing event (primarily associated with Safe2Ditch). Moreover, ICAROUS was the target of more responses (11) than Safe2Ditch (7), yet this could be due to the detect-and-avoid functionality being the focus of three scenarios and the contingency management functionality being the focus of two. Although some of the excerpts noted below do not mention ICAROUS or Safe2Ditch specifically, the longer context of each response explicitly refers to either of those system (i.e., these were not categorized into themes based on assumptions relating either system).

Table 4.

Frequency of Participant Responses Related to Automation Themes.

Themes	Scenario 2 Geofence Test: ICAROUS	Scenario 5 Flight Path Deviation (Low): ICAROUS	Scenario 6 Flight Path Deviation (High): ICAROUS	Scenario 7 Landing Deviation (Low) Safe2Ditch and ICAROUS	Scenario 8 Landing Deviation (High) Safe2Ditch and ICAROUS	Total Responses/Theme
ICAROUS behaviors	4 (P2, P4, P5, P6)	1 (P5)	1 (P6)			6
ICAROUS transparency	1 (P6)		4 (P2, P4, P5, P6)			5
Safe2Ditch transparency				4 (P2, P5, P6²)	3 (P4², P6)	7
Total responses/Scenario	5	1	5	4	3	Grand total = 18

Note. Participant identification number in parentheses (e.g., P6 = Participant 6). Superscript number indicates how many times the participant’s comment was organized into a theme.

ICAROUS Behaviors

Interestingly, participants commented on the behaviors of only ICAROUS and not Safe2Ditch (i.e., no Safe2Ditch behavior themes were identified). Additionally, there was at least one response for each scenario testing the ICAROUS detect-and-avoid functionality. The ICAROUS Behavior responses were mainly related to the discomfort with the resolution path of the aircraft in proximity to the geofence (e.g., “Was unsure how close [it] would get to the geofence. It appeared as if it may cross fence,” “I was uncomfortable with the track it took trying to acquire the flight path after going around the geofence. It made a wide sweeping right hand turn that almost broke the geofence boundary”) and remarks on the flight characteristics of the vehicle during the resolution (e.g., “ICAROUS appeared to cause the multi-rotor vehicle to behave like a fixed wing vehicle when reengaging waypoint 5/mission path,” “It turned like I would expect a fixed wing plane to. Instead of just stopping and hovering to point back in the correct direction”). Although each of the ICAROUS detect-and-avoid scenarios had at least one behavior-related response, these were mostly for the geofence avoidance scenario (scenario 2). Anecdotally, the proximity of the aircraft to the geofence boundary appeared close, yet never resulted in a breach. This likely contributed to the consistent responses among participants. ICAROUS Behavior responses for scenarios 5 and 6 were related to observations during a single landing sequence and a single delayed ICAROUS message during an incursion. Because the ICAROUS-specific trust questionnaire was focused on the collision avoidance functionality and not the geofence avoidance, it is difficult to know if or how the geofence avoidance responses relate to the trust results.

ICAROUS Transparency

ICAROUS transparency issues were mostly isolated to scenario 6, which was the high-threat traffic incursion that required the system to reroute around an intruder vehicle. The responses were all generally related to wanting more information about how ICAROUS works (e.g., “Need to know the boundaries of ICAROUS during active control”) and how to anticipate behaviors (e.g., “I would want to see the traffic bands to get insight when ICAROUS plans to take action,” “I would like to see a more clear indication that ICAROUS was actively intervening with the flight”). Although some of the responses were similar to the ICAROUS Behavior responses (which also have clear transparency implications), these were categorized as transparency issues because they were focused on wanting some insight (via display or explanation) for what ICAROUS would do (cf. performance-based trust) or how it works (cf. process-based trust) rather than simply reporting on the effects ICAROUS had on the behavior of the aircraft.

Safe2Ditch Transparency

Safe2Ditch transparency issues were noted across both scenarios where the aircraft did not complete the full route and instead Safe2Ditch was triggered, resulting in the aircraft proceeding to a pre-designated “ditch site.” Responses were similar and relatively evenly split across the two scenarios, including four of the six participants. Responses generally referred to being unclear as to which ditch site the aircraft would travel to (e.g., “Desire to see ditch site highlight when S2D [is] active,” “I would like a more definitive representation of what ditch site has been selected when I activated S2D,” “Would request a highlighted area around target ditch site”). Again, these comments were centered around wanting some insight, exclusively via display, for what ditch site it selected rather than comments on the actual aircraft behaviors Safe2Ditch produced.

Discussion

The purpose of this study was to examine factors that affect expert GCSO trust development in increasingly autonomous systems during a high-fidelity remote UAS simulation. Hypotheses were generally supported, which we discuss in turn along with non-hypothesized findings, limitations, and practical takeaways from the study.

Automation Exposure and Performance-Based Trust

Supporting H1, performance-based trust significantly changed (increased) between the pre- and post-study measurements. This is somewhat unsurprising as the major intervention was exposure to the target automated systems, and neither system objectively failed to execute their function. This is consistent with the concept of dynamic learned trust, where observed system reliability more directly impacts performance-based trust (Hoff & Bashir, 2015). Moreover, performance-based trust in ICAROUS did not appear to be negatively impacted by the unusual behaviors displayed when avoiding the geofence. Participant responses indicated that ICAROUS produced behaviors and flight characteristics that were not expected (cf. local explaining activities, Klein et al., 2021). Yet this may be due to the questionnaire targeting the collision avoidance functionality exclusively, rather than also targeting the geofence avoidance functionality specifically. Performance-based trust would have likely dropped from the pre-to the post-study measure if these behaviors were treated as failures of the ICAROUS system as a whole (cf. system-wide trust; Keller & Rice, 2009; Geels-Blair et al., 2013), as negative experiences due to automation failures tend to have a greater influence on trust than positive experiences from automation successes (i.e., negativity bias; Yang, Guo, & Schemanske, 2023). Yet some research suggests a positive correlation between trust and general experience with a target technology (Schaefer et al., 2016) and, counterintuitively, trust can even increase with sustained failures (de Visser et al., 2006). As suggested by Tenhundfeld et al. (2019), this effect could be explained by an increase in understanding of the automation, or familiarity, which may have buffered the effects of a negativity bias.

Automation Familiarity and Trust

Partially supporting H2, familiarity with the systems predicted process-based trust. This is consistent with the concept of initial learned trust, which is based on preexisting knowledge such as understating how a system works (Hoff & Bashir, 2015). These results echo the findings of Tenhundfeld et al.’s (2019) study, which was conducted in a real-world setting. Failing to fully support H2, however, purpose-based trust was not significantly affected by system familiarity, and instead individual differences predicted those ratings.

It is important to note that ICAROUS and Safe2Ditch are research software, which are continuously under development and being used in new ways. ICAROUS is an architecture with a detect-and-avoid capability, yet it is highly configurable to enable modular integration of mission-specific software components (see Consiglio et al., 2016). In a recent study, ICAROUS and Safe2Ditch were tightly integrated (see Duffy et al., 2020) and not functionally isolated for some scenarios as they were in the current study (see Table 1). It makes sense, therefore, that individual differences predicted purpose-based trust ratings, and like performance-based trust, it changed between the pre- and post-study measures as information regarding why the automation was being used in this study was assimilated.

Similarly, automation capabilities in some personal ground vehicles may change based on “over-the-air” software updates. For example, Endsley (2017) noted unanticipated mode interactions and emergent behaviors with a Tesla Model S, where software updates linked the behaviors of previously independent automation modes in certain circumstances. She reported that despite incidences of automation-related issues, her trust in the vehicle automation increased over the period of study, which she attributed to software updates and adapting to what the automation could or could not do. The GCSOs seemed to adapt to why the automation was being used in the current study (i.e., purpose-based trust), and trust improved between measurement times as it was supported by observing stable performance (i.e., performance-based trust) and familiarity with how the automation works (i.e., process-based trust). This is an interesting example for how one dimension of trust (i.e., performance-based) may influence another (i.e., purpose-based) in a high-fidelity simulation with experienced participants. Future research should be conducted to confirm this, however, as we observed only the isolated effects of exposure on two separate dimensions of trust and did not analyze the sequential effects of one base of trust on another.

Perceived Risk and Trust

Although these were simulated operations, we suspected that the professional GCSO participants would feel a level of perceived risk toward the automated systems for two reasons. First, the simulated operations were preparation for a follow-on live flight version, where attention to safety was a critical factor for proceeding to that step. Even though the flights are mostly automated, the GCSO is responsible for safety of flight and is required to takeover for the automation if it fails to function as intended (simulation and live-flight operations). Second, the participants are professional GCSOs hired to perform these types of operations. This study placed participants in a situation where they were asked to display the requisite skills to act as a GCSO, with the associated risk of displaying a failure to meet performance expectations (e.g., failure to takeover for the automation in the event it fails). Supporting H3, participants that rated the systems as being risky before the study tended to also trust the system less, but then rated the systems as more trustworthy after being exposed to the systems. Participants that rated the systems as less risky tended to give higher trust ratings before and after the study. As posited by Stuck et al. (2022), the operator’s subjective trust in the automation was related to their perceived relational risk (i.e., “…belief about the probability and/or feeling that interacting with a specific system…with which a user has a personal history or historical knowledge of, has potential negative outcomes” p. 503). Yet it should be noted that perceived risk of the systems did not predict any of the other theory-based trust factors (Performance, Process, and Purpose) and was isolated to the one-item, atheoretical trust measure. Although this result adds to the empirical evidence highlighting the relationship between automation trust and perceived risk (e.g., Chancey et al., 2017; Lyons & Stokes, 2012; Sato et al., 2020), a more consistent and careful consideration of risk should be pursued in future research (see Stuck et al., 2022, for recommendations).

System-Specific Trust

Though not hypothesized, participants rated Safe2Ditch as more trustworthy than ICAROUS on the single-item measure. Although Safe2Ditch was technically integrated with ICAROUS, we suspected that participants would evaluate each as functionally separate systems. From a user-perspective, ICAROUS and Safe2Ditch are represented as separate user interface elements on the MPATH display, but these systems have not always been integrated or presented as such to GCSOs in previous flight activities. Supporting this, responses for detect-and-avoid scenarios coded to ICAROUS-specific comments and contingency management scenarios coded to Safe2Ditch-specific comments. There was, however, one response that acknowledges the integration of the systems: “I didn’t realize ICAROUS and S2D [Safe2Ditch] were interconnected until this run.” This comment seems to reflect the assimilation of information regarding how ICAROUS and Safe2Ditch were configured for this particular study as the participant was exposed to the automation (see Automation Familiarity and Trust section above). However, we did observe a significant difference in the single-item trust measure between the systems.

The lower trust score for ICAROUS could be because there were more detect-and-avoid scenarios than contingency management scenarios (i.e., there were three ICAROUS-specific scenarios as compared to the two Safe2Ditch-specific scenarios). The effect could also be partially explained by the qualitative results. Participants provided six responses regarding unexpected behaviors ICAROUS produced on the aircraft and five ICAROUS-related transparency issue responses. Safe2Ditch, however, received seven comments concerning transparency issues exclusively. It is possible that the behaviors ICAROUS produced on the aircraft resulted in the lower trust ratings, as both systems had a similar number of transparency issue-related comments. Yet the ICAROUS-specific trust questionnaire targeted the collision avoidance functionality and not the geofence avoidance functionality. Because of this, we are not able to make a clear connection between the geofence avoidance behaviors, which accounted for four of the six responses, and the trust questionnaire results. Interestingly, however, there was not also a significant difference between the two systems on the performance-based trust measure, which might be expected if the geofence avoidance behaviors are driving the difference in the single-item scale (see Automation Exposure and Performance-Based Trust section). Again, this was an effect observed on the single-item trust measure, which does not align with any particular theoretical perspective.

Limitations

This study was not without limitations. First, due to operational constraints, we did not measure trust following individual scenarios using the target automated systems. Moreover, because scenarios were not given in the same order, a trust measure given at the end of the first day would have been difficult to interpret coherently across participants. Because we measured trust at only two points in time (i.e., two “snapshots”), we were unable to investigate the dynamics of trust either building or diminishing during the interactions with the target automated systems (see Yang, Guo, & Schemanske, 2023, for discussion). In controlled laboratory settings, trust in automated systems has been shown to be quite dynamic as participants acclimate to system characteristics (e.g., Bliss et al., 2020; Lee & Moray, 1992, 1994; Yang, Schemanske, & Searle, 2023). As predicted, performance-based trust changed between the pre- and post-study measurement times. Yet without additional measurement points, it is not possible to provide a more nuanced depiction for how trust fluctuated based on interactions with the target automated systems or to what degree trust may or may not have stabilized at the post-study measurement time.

Second, the study lacked behavioral outcome measures. Generally, the purpose for investigating trust (and other predictor variables) is to determine automation use and how that use affects joint human-automation performance (Parasuraman & Riley, 1997). In laboratory settings, it is often easier to experimentally impose event rates that require human intervention, to study the effects of trust on behavior (e.g., Chancey et al., 2015, 2017). For this study, however, the GCSO had minimal performance requirements during scenarios because the automation did not fail. Ecologically, the GCSO as a system monitor will plausibly be one of the favored AAM approaches for pairing humans with increasingly autonomous systems (a task that humans are poorly suited for; Warm et al., 2008). Future research should explore meaningful human control (Smith et al., 2021), scenarios requiring operator intervention in the context of high perceived situational risk (see Stuck et al., 2022), as well as out-of-the-loop unfamiliarity issues (Endsley & Kiris, 1995) that may arise from passive monitoring (Lee & Seppelt, 2012), vigilance (Warm et al., 2008), and complacency and bias in human-automation paradigms (Parasuraman & Manzey, 2010). Designing for the human to be a system monitor is not a new issue, and emerging remote vehicle operations will, unfortunately, likely move toward that “design solution” without input from the human factors community.

Contributions and Practical Takeaways

This study provides one of the more ecologically representative evaluations of human-automation trust in a UAS environment. Results largely confirmed theoretical predictions, such as familiarity and perceived risk were important predictors of trust and exposure to automated systems can impact trust development in even experienced operators that have interacted with a system. Several key practical takeaways can be drawn from the results of this study:

• Familiarity with a system, exposure to a system’s behavior, and individual differences of the operator each uniquely and independently contribute to the development of trust in an automated system across the three bases of trust (i.e., performance, process, and purpose). Disregarding any of these factors could lead to inadequate development of at least one of the three bases of trust, which would lead to more brittle overall trust in the automation. The unique effects of these factors can be characterized as follows:

(a) Individuals who are more familiar with a system tend to trust it more;

(b) Exposure to a reliable system can increase trust in that system; and

• Trust associated with a specific function/goal within an automated system may continue to increase even as the operator observes unusual behavior from that system while performing a different function/goal. Consideration should be given to ensure appropriate calibration of trust across all of the functions performed by an automated system.

Conclusion

The current study documented factors that affected trust in a high-fidelity remote UAS simulation. Generally, studies investigating constructs such as trust are constrained to tightly controlled laboratory settings (e.g., Chancey et al., 2015, 2017) in which generalizability to the real world can be tenuous if results are not then confirmed to high-fidelity simulations and field studies (i.e., many factors are not controlled for in the real world, see Chancey et al., 2023; see trust theory applied in an operational aviation context elsewhere, Ho et al., 2017, Lyons et al., 2017). The results from this study indicate that human-automation trust develops according to theoretical predictions. Moreover, these results suggest trust may be used to help guide the adoption of increasingly autonomous systems in future AAM applications.

Supplemental Material

Supplemental Material - Human-Automation Trust Development as a Function of Automation Exposure, Familiarity, and Perceived Risk: A High-Fidelity Remotely Operated Aircraft Simulation

Supplemental Material for Human-Automation Trust Development as a Function of Automation Exposure, Familiarity, and Perceived Risk: A High-Fidelity Remotely Operated Aircraft Simulation by Eric T. Chancey, Michael S. Politowicz, Kathryn M. Ballard, James Unverricht, Bill K. Buck, and Steven Geuther in Journal of Cognitive Engineering and Decision Making

Footnotes

Acknowledgement

This work was supported by NASA’s Transformational Tools and Technologies Revolutionary Aviation Mobility Subproject and the Advanced Air Mobility High-Density Vertiplex Subproject. The views expressed are those of the authors and do not necessarily reflect the official policy or position of NASA or the U.S. Government.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was financially supported by 2 NASA projects.

ORCID iD

Eric T. Chancey

Supplemental Material

Supplemental material for this article is available online.

Eric T. Chancey is a human factors researcher at NASA Langley Research Center in the Crew Systems and Aviation Operations Branch. He earned his Ph.D. in human factors psychology from Old Dominion University in 2016.

Michael S. Politowicz is an aerospace engineer and researcher at NASA Langley Research Center in the Crew Systems and Aviation Operations Branch. He earned his bachelor’s degree in aerospace engineering from the University of Michigan in 2011 and his master’s degree in psychology (human factors) from Old Dominion University in 2024. He is currently a doctoral student for a Ph.D. in human factors psychology at Old Dominion University.

Kathryn M. Ballard is a statistical engineer at NASA Langley Research Center in the Engineering Integration Branch. She earned her master’s degree in statistics from the University of Kentucky in 2014.

James R. Unverricht is a human factors engineer working for the United States Navy as a civil servant. Previously, he was a research scientist at the National Institute of Aerospace. He earned his Ph.D. in human factors psychology from Old Dominion University in 2023.

Bill K. Buck is a systems engineer at NASA Langley Research Center in the Engineering Integration Branch. He earned his master’s degree in aerospace engineering from Old Dominion University in 2004.

Steven C. Geuther is an aerospace engineer at NASA Langley Research Center in the Aeronautics Systems Engineering Branch. He earned his master’s degree in mechanical engineering from Lehigh University in 2015.

References

Aubuchon

V. V.

Hashemi

K. E.

Shively

R. J.

Wishart

J. M.

(2022). Multi-vehicle (m:N) operations in the NAS – NASA’s research plans. In AIAA AVIATION forum. AIAA.

Bates

Mächler

Bolker

B. M.

Walker

S. C.

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 1406(1), 1–48. https://doi.org/10.18637/jss.v067.i01

Bliss

J. P.

Dunn

Fuller

B. S.

(1995). Reversal of the cry-wolf effect: An investigation of two methods to increase alarm response rates. Perceptual and Motor Skills, 80(3_suppl), 1231–1242. https://doi.org/10.2466/pms.1995.80.3c.1231

Bliss

J. P.

Schwark

Rice

Geels

(2020). Automation trust and situational experience: Theoretical assumptions and experimental evidence. In Mouloua

Hancock

P. A.

(Eds.), Human performance in automated and autonomous systems: Emerging issues and practical perspectives (pp. 155–173). CRC Press.

Braun

Clarke

(2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa

Buck

B. K.

Chancey

E. T.

Politowicz

M. S.

Unverricht

Geuther

S. C.

(2023). A remote vehicle operations center’s role in collecting human factors data. In AIAA SciTech forum. AIAA.

Carmody

Ficke

Nguyen

Addis

Rebensky

Carroll

(2022). A qualitative analysis of trust dynamics in human-agent teams (HATs). In Proceedings of the human factors and ergonomics society 2022 annual meeting (pp. 152–156). Sage Publications.

Chancey

E. T.

(2020). Effects of concepts of operation factors on public acceptance of intention to use urban air mobility (UAM) – trust and technology acceptance modeling (NASA technical memorandum-20205003359). NASA Langley Research Center.

Chancey

E. T.

Bliss

J. P.

Proaps

A. B.

Madhavan

(2015). The role of trust as a mediator between system characteristics and response behaviors. Human Factors, 57(6), 947–958. https://doi.org/10.1177/0018720815582261

10.

Chancey

E. T.

Bliss

J. P.

Yamani

Handley

H. A. H.

(2017). Trust and the compliance-reliance paradigm: The effects of risk, error bias, and reliability on trust and dependence. Human Factors, 59(3), 333–345. https://doi.org/10.1177/0018720816682648

11.

Chancey

E. T.

Politowicz

M. S.

Buck

B. K.

Ballard

K. M.

Houston

Le Vie

L. R.

Unverricht

Chandarana

(2023). Foundational human-autonomy teaming research and development in scalable remotely operated Advanced Air Mobility operations: Research model and initial work. In AIAA SciTech forum. AIAA.

12.

Chen

J. Y.

Procci

Boyce

Wright

Garcia

Barnes

(2014). Situation awareness-based agent transparency (ARL-TR-6905). Army Research Laboratory.

13.

Chiou

E. K.

Lee

J. D.

(2023). Trusting automation: Designing for responsivity and resilience. Human Factors, 65(1), 137–165. https://doi.org/10.1177/00187208211009995

14.

Clothier

R. A.

Greer

D. A.

Greer

D. G.

Mehta

A. M.

(2015). Risk perception and the public acceptance of drones. Risk Analysis, 35(6), 1167–1183. https://doi.org/10.1111/risa.12330

15.

Cohen

M. S.

Parasuraman

Freeman

J. T.

(1998). Trust in decision aids: A model and its training implications. In Proceedings of the command and control research and technology symposium (pp. 1–37).

16.

Consiglio

Muñoz Hagen

Narkawicz

Balachandran

(2016). ICAROUS: Integrated configurable algorithms for reliable operations of unmanned systems. In 2016 IEEE/AIAA 35^th digital avionics systems conference (DASC). IEEE.

17.

de Visser

Parasuraman

Freedy

Weltman

(2006). A comprehensive methodology for assessing human-robot team performance for use in training and simulation. In Proceedings of the human factors and ergonomics society 50^th annual meeting (pp. 2639–2643). Sage Publications.

18.

de Visser

E. J.

Peeters

M. M. M.

Jung

M. F.

Kohn

Shaw

T. H.

Pak

Neerincx

M. A.

(2020). Towards a theory of longitudinal trust calibration in human-robot teams. International Journal of Social Robotics, 12(2), 459–478. https://doi.org/10.1007/s12369-019-00596-x

19.

Duffy

Balachandran

Peters

Smalling

Consiglio

Glaab

Moore

Muñoz

(2020). Onboard autonomous sense and avoid of non-conforming unmanned aerial systems. In 2020 AIAA/IEEE 39^th digital avionics systems conference (DASC). IEEE.

20.

Endsley

M. R.

(2017). Autonomous driving systems: A preliminary naturalistic study of the Tesla model S. Journal of Cognitive Engineering and Decision Making, 11(3), 225–238. https://doi.org/10.1177/1555343417695197

21.

Endsley

M. R.

Kiris

E. O.

(1995). The out-of-the-loop performance problem and level of control in automation. Human Factors, 37(2), 381–394. https://doi.org/10.1518/001872095779064555

22.

Fine

G. A.

Holyfield

(1996). Secrecy, trust, and dangerous leisure: Generating group cohesion in voluntary organizations. Social Psychology Quarterly, 59(1), 22–38. https://doi.org/10.2307/2787117

23.

Geels-Blair

Rice

Schwark

(2013). Using system-Wide Trust Theory to reveal the contagion effects of automation false alarms and misses on compliance and reliance in a simulated aviation task. The International Journal of Aviation Psychology, 23(3), 245–266. https://doi.org/10.1080/10508414.2013.799355

24.

Glaab

L. J.

Glaab

P. C.

Lusk

Petty

B. J.

Beard

R. W.

Dolph

C. V.

McSwain

R. G.

(2018). Safe2Ditch autonomous crash management system for small unmanned aerial systems: Concept definition and flight test results (NASA/TM-2018-220110). NASA Langley Research Center.

25.

Glaab

L. J.

Johnsons

M. A.

McSwain

R. G.

Geuther

S. C.

Dao

Q. V.

Homola

J. R.

(2022). The high density vertiplex advanced onboard automation overview. In IEEE/AIAA 41^st digital avionics systems conference (DASC).

26.

Goodrich

K. H.

Theodore

C. R.

(2021). Description of the NASA urban air mobility maturity level (UML) scale. AIAA SciTech 2021 forum. AIAA.

27.

Guo

Yang

X. J.

(2021). Modeling and predicting trust dynamics in human-robot teaming: A Bayesian inference approach. International Journal of Social Robotics, 13(8), 1899–1909. https://doi.org/10.1007/s12369-020-00703-3

28.

N. T.

Sadler

G. G.

Hoffmann

L. C.

Lyons

J. B.

Johnson

W. W.

(2017). Trust of a military automated system in an operational context. Military Psychology, 29(6), 524–541. https://doi.org/10.1037/mil0000189

29.

Hoff

K. A.

Bashir

(2015). Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors, 57(3), 407–434. https://doi.org/10.1177/0018720814547570

30.

Holbrook

J. B.

Stewart

M. J.

Smith

B. E.

Prinzel

L. J.

Matthews

B. L.

Avrekh

Cardoza

C. T.

Amman

O. C.

Adduru

Null

C. H.

(2019). Human performance contributions to safety in commercial aviation (NASA/TM–2019-220417). NASA Langley Research Center.

31.

Kaber

D. B.

(2018). A conceptual framework of autonomous and automated agents. Theoretical Issues in Ergonomics Science, 19(4), 406–430. https://doi.org/10.1080/1463922x.2017.1363314

32.

Keller

Rice

(2009). System-wide versus component-specific trust using multiple aids. The Journal of General Psychology, 137(1), 114–128. https://doi.org/10.1080/00221300903266713

33.

Klein

Hoffman

Mueller

Newsome

(2021). Modeling the process by which people try to explain complex things to others. Journal of Cognitive Engineering and Decision Making, 15(4), 213–232. https://doi.org/10.1177/15553434211045154

34.

Lee

Moray

(1992). Trust, control strategies and allocation of function in human-machine systems. Ergonomics, 35(10), 1243–1270. https://doi.org/10.1080/00140139208967392

35.

Lee

J. D.

Moray

(1994). Trust, self-confidence, and operators’ adaptation to automation. International Journal of Human-Computer Studies, 40(1), 153–184. https://doi.org/10.1006/ijhc.1994.1007

36.

Lee

J. D.

See

K. A.

(2004). Trust in automation: Designing for appropriate reliance. Human Factors, 46(1), 50–80. https://doi.org/10.1518/hfes.46.1.50_30392

37.

Lee

J. D.

Seppelt

B. D.

(2012). Human factors and ergonomics in automation design. In Salvendy

(Ed.), Handbook of human factors and ergonomics (4th ed., pp. 1651). John Wiley & Sons.

38.

Holthausen

B. E.

Stuck

R. E.

Walker

B. N.

(2019). No risk no trust: Investigating perceived risk in highly automated driving. In AutomotiveUI 19: Proceedings of the 11th international conference on automotive user interfaces and interactive vehicular applications, Utrecht, Netherlands, 21–25 September, 2019, pp. 177–185.

39.

Liu

Macchiarella

N. D.

Vincenzi

D. A.

(2009). Simulation fidelity. In Vincenzi

D. A.

Wise

J. A.

Mouloua

Hancock

P. A.

(Eds.), Human factors in simulation and training (pp. 61–73). CRC Press.

40.

Lyons

J. B.

(2013). Being transparent about transparency: A model for human-robot interaction. In Trust and autonomous systems: Papers from the 2013 AAAI spring symposium, Palo Alto, California, USA, 25–27 March, 2013, pp. 48–53.

41.

Lyons

J. B.

N. T.

Van Abel

A. L.

Hoffmann

L. C.

Sadler

G. G.

Fergueson

W. E.

Grigsby

M. A.

Wilkins

(2017). Comparing trust in Auto-GCAS between experienced and novice air force pilots. Ergonomics in Design, 25(4), 4–9. https://doi.org/10.1177/1064804617716612

42.

Lyons

J. B.

Stokes

C. K.

(2012). Human-human reliance in the context of automation. Human Factors, 54(1), 112–121. https://doi.org/10.1177/0018720811427034

43.

Mayer

R. C.

Davis

J. H.

Schoorman

F. D.

(1995). An integrative model of organizational trust. Academy of Management Review, 20(3), 709–734. https://doi.org/10.5465/amr.1995.9508080335

44.

Muir

B. M.

(1987). Trust between humans and machines, and the design of decision aids. International Journal of Man-Machine Studies, 27(5–6), 527–539. https://doi.org/10.1016/s0020-7373(87)80013-5

45.

National Academies of Sciences, Engineering, and Medicine . (2020). Advanced aerial mobility: A national blueprint. The National Academies Press.

46.

National Research Council . (2014). Autonomy research for civil aviation: Toward a new era of flight. The National Academies Press.

47.

Norman

(2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15(5), 625–632. https://doi.org/10.1007/s10459-010-9222-y

48.

Nuijten

M. B.

Wetzels

Matzke

Dolan

C. V.

Wagenmakers

(2015). A default Bayesian hypothesis test for mediation. Behavioral Research Methods, 47(1), 85–97. https://doi.org/10.3758/s13428-014-0470-2

49.

O’Neill

T. A.

McNeese

N. J.

Barron

Schelble

(2022). Human-autonomy teaming: A review and analysis of the empirical literature. Human Factors, 64(5), 904–938. https://doi.org/10.1177/0018720820960865

50.

Parasuraman

Manzey

D. H.

(2010). Complacency and bias in human use of automation: An attentional integration. Human Factors, 52(3), 381–410. https://doi.org/10.1177/0018720810376055

51.

Parasuraman

Riley

(1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39(2), 230–253. https://doi.org/10.1518/001872097778543886

52.

Parasuraman

Sheridan

T. B.

Wickens

C. D.

(2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 30(3), 286–297. https://doi.org/10.1109/3468.844354

53.

Politowicz

M. S.

Chancey

E. T.

Buck

B. K.

Unverricht

J. R.

Petty

B. J.

(2023). MPATH (measuring performance for autonomy teaming with humans) ground control station: Design approach and initial usability results. In AIAA SciTech forum. AIAA.

54.

Politowicz

M. S.

Chancey

E. T.

Glaab

L. J.

(2021). Effects of autonomous sUAS separation methods on subjective workload, situation awareness, and trust. In 2021 AIAA SciTech forum. AIAA.

55.

Pritchett

Portman

Nolan

(2018). Research & Technology development for human-autonomy teaming – Final report: Literature review and findings from stakeholder interviews. NASA Langley Research Center.

56.

Rempel

J. K.

Holmes

J. G.

Zanna

M. P.

(1985). Trust in close relationships. Journal of Personality and Social Psychology, 49(1), 95–112. https://doi.org/10.1037//0022-3514.49.1.95

57.

Salas

Sims

D. E.

Burke

C. S.

(2005). Is there a “Big Five” in teamwork? Small Group Research, 36(5), 555–599. https://doi.org/10.1177/1046496405277134

58.

Sato

Yamani

Liechty

Chancey

E. T.

(2020). Automation trust increases under high workload multitasking scenarios involving risk. Cognition, Technology & Work, 22(2), 399–407. https://doi.org/10.1007/s10111-019-00580-5

59.

Schaefer

K. E.

Chen

J. Y. C.

Szalma

J. L.

Hancock

P. A.

(2016). A meta-analysis of factors influencing the development of trust in automation: Implications for understanding autonomy in future systems. Human Factors, 58(3), 377–400. https://doi.org/10.1177/0018720816634228

60.

Schaefer

K. E.

Hill

S. G.

Jentsch

F. G.

(2019). Trust in human-autonomy teaming: A review of trust research from the US army research laboratory robotics collaborative technology alliance. In Chen

(Ed.), Advances in human factors in robots and unmanned systems. AHFE 2018. Advances in intelligent systems and computing (Vol. 784). Springer.

61.

Seltman

H. J.

(2012). Experimental design and analysis. Carnegie Mellon University.

62.

Shadish

W. R.

Cook

T. D.

Campbell

D. T.

(2002). Experimental and quasi-experimental designs for generalized causal inference. Wadsworth Cengage Learning.

63.

Sheridan

T. B.

(1988). Trustworthiness of command and control systems. Man-Machine Systems, 21(5), 427–431. https://doi.org/10.1016/s1474-6670(17)53945-2

64.

Sheridan

T. B.

(2019a). Individual differences in attributes of trust in automation: Measurement and application to system design. Frontiers in Psychology, 10, 1117. https://doi.org/10.3389/fpsyg.2019.01117

65.

Sheridan

T. B.

(2019b). Extending three existing models to analysis of trust in automation: Signal detection, statistical parameter estimation, and model-based control. Human Factors, 61(7), 1162–1170. https://doi.org/10.1177/0018720819829951

66.

Sheridan

T. B.

Fischhoff

Posner

Pew

R. W.

(1983). Supervisory control systems. Research needs for human factors (pp. 49–77). National Academy Press.

67.

Sheridan

T. B.

Hennessy

R. T.

(Eds.), (1984). Research and modeling of supervisory control behavior: Report of a workshop. National Academy Press.

68.

Sheridan

T. B.

Verplank

W. L.

(1978). Human and computer control of undersea teleoperators. MIT Man-Machine Systems Laboratory Report.

69.

Singer

J. D.

(1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational Statistics, 23(4), 323–355. https://doi.org/10.3102/10769986023004323

70.

Smith

C. L.

Sadler

Tyson

Brandt

Rorie

R. C.

Keeler

Monk

Viramontes

Dolgov

(2021). A cognitive walkthrough of multiple drone delivery operations. In AIAA AVIATION forum. AIAA.

71.

Stuck

R. E.

Tomlinson

B. J.

Walker

B. N.

(2022). The importance of incorporating risk into human-automation trust. Theoretical Issues in Ergonomics Science, 23(4), 500–516. https://doi.org/10.1080/1463922x.2021.1975170

72.

Tenhundfeld

N. L.

de Visser

E. J.

Haring

K. S.

Ries

A. J.

Finomore

V. S.

Tossell

C. C.

(2019). Calibrating trust in automation through familiarity with the autoparking feature of a Tesla Model X. Journal of Cognitive Engineering and Decision Making, 13(4), 279–294. https://doi.org/10.1177/1555343419869083

73.

Vicente

K. J.

(1997). COMMENTARY heeding the legacy of Meister, Brunswik, & Gibson: Toward a broader view of human factors research. Human Factors, 39(2), 323–328. https://doi.org/10.1518/001872097778543822

74.

Wagenmakers

(2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804. https://doi.org/10.3758/bf03194105

75.

Warm

J. S.

Parasuraman

Matthews

(2008). Vigilance requires hard mental work and is stressful. Human Factors, 50(3), 433–441. https://doi.org/10.1518/001872008X312152

76.

Wickham

(2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag. https://ggplot2.tidyverse.org

77.

Yamani

Long

S. K.

Sato

Braitman

Politowicz

M. S.

Chancey

E. T.

(in press). Multi-level confirmatory factor analysis reveals two distinct human-automation trust constructs. Human Factors. https://doi.org/10.1177/00187208241263774

78.

Yang

X. J.

Guo

Schemanske

(2023). From trust to trust dynamics: Combining empirical and computational approaches to model and predict trust dynamics in human-autonomy interaction. In Duffy

V. G.

Landry

S. J.

Lee

J. D.

Stanton

(Eds.), Human-automation interaction: Transportation (pp. 253–265). Springer.

79.

Yang

X. J.

Schemanske

Searle

(2023). Toward quantifying trust dynamics: How people adjust their trust after moment-to-moment interaction with automation. Human Factors, 65(5), 862–878. https://doi.org/10.1177/00187208211034716

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.34 MB

Human-Automation Trust Development as a Function of Automation Exposure,Familiarity,and Perceived Risk: A High-Fidelity Remotely Operated Aircraft Simulation

Abstract

Keywords

Human-Automation Trust Development

Purpose and Hypotheses

Method

Participants

GCSO Role and Apparatuses

GCSO Role

Remote Operations for Autonomous Missions (ROAM) UAS Operations Center and GCSO Workstation

Automated Systems

Design

Procedure

Results

Composite Trust

Performance-Based Trust

Process-Based Trust

Purpose-Based Trust

General Trust

Qualitative Results

Operational Themes

Automation Themes

ICAROUS Behaviors

ICAROUS Transparency

Safe2Ditch Transparency

Discussion

Automation Exposure and Performance-Based Trust

Automation Familiarity and Trust

Perceived Risk and Trust

System-Specific Trust

Limitations

Contributions and Practical Takeaways

Conclusion

Supplemental Material

Supplemental Material - Human-Automation Trust Development as a Function of Automation Exposure, Familiarity, and Perceived Risk: A High-Fidelity Remotely Operated Aircraft Simulation

Footnotes

Acknowledgement

Declaration of Conflicting Interests

Funding

ORCID iD

Supplemental Material

References

Supplementary Material