Sage Journals: Discover world-class research

Abstract

Development of responsive automation necessitates a framework for studying human-automation interactions in a broad range of operating conditions. This study uses a novel experiment design involving multiple binary perturbations in different stimuli to elicit measurable changes in cognitive factors that affect human-decision making during conditionally-automated (SAE Level 3) driving: trust in automation, mental workload, self-confidence, and risk perception. To infer changes in these factors, psychophysiological metrics such as heart rate variability and galvanic skin response, behavioral metrics such as eye gaze and reliance on automation, and self-reports were collected. Findings from statistical tests revealed significant changes, particularly in psychophysiological and behavioral metrics, for some treatments. However, other treatments did not elicit a significant change, highlighting the complexities of a between-subject experiment design with variations in multiple independent variables. Findings also underscore the importance of collecting heterogeneous human data to infer changes in cognitive factors during interactions with automation.

Keywords

automation cognition transportation and mobility

Introduction

To enhance human-automation teaming, it is necessary to incorporate human cognition models in the design and operation of autonomous systems. Researchers have studied cognitive factors that affect human decision-making in conditionally automated vehicles using event-based studies, with a focus on human responses to specific events such as take-over requests (TORs) (Melcher et al., 2015). Nevertheless, implementation of continuous-time and responsive automation in real-world scenarios ultimately necessitates a framework to (1) account for a broad range of operating conditions and (2) analyze a human’s cognitive factors throughout their interaction with the automated vehicle. Furthermore, while researchers have acknowledged that users’ workload, risk perception, and self-confidence may mitigate or influence the dominant relationship between user trust in automation and reliance (Riley, 1996), many existing studies examine only one or two of these factors at a time. Finally, studies in which multiple independent variables are varied simultaneously are limited. Varying one independent variable at a time is necessary to establish one-to-one relationships between a stimulus and its response, but this fails to capture interaction effects between different stimuli.

In this study, we present a novel experiment design involving multiple binary perturbations in different stimuli to elicit measurable changes in cognitive factors that affect human decision-making in SAE Level 3 autonomous vehicles, namely trust in the automation (trust), mental workload (workload), risk perception (risk), and self-confidence (confidence). To the best of our knowledge, this work is unique in its aim to analyze four cognitive factors in a non-deterministic autonomous driving experiment by manipulating four independent variables. The independent variables—task complexity, automation transparency, system reliability, and recommended control mode—are selected specifically to perturb these cognitive factors based on existing literature (Akash et al., 2020; Melcher et al., 2015). The dependent variables include self-reported cognitive factors as well as psychophysiological and behavioral metrics. Self-reports, while being a direct measure of a human’s cognitive factors, cannot be solicited too often without disrupting or distracting the participant and often suffer from bias (Rosenman et al., 2011). Conversely, psychophysiological (and some behavioral) metrics provide real-time continuous data to infer these factors. Psychophysiological sensing modalities used in this study include Heart Rate Variability (HRV) and Galvanic Skin Response (GSR). Behavioral measures include eye gaze and automation-reliance behaviors. While the broader objective of this work is to identify a computational model to predict the dynamics of human cognitive factors that affect decision-making during conditionally-automated driving, the scope of this study is to validate several established relationships between the independent and dependent variables in the context of this novel experiment. These relationships include the effects of changes in independent variables on the cognitive factors, and correlations between objective and subjective measures. The paper is outlined as follows. We first discuss existing work and state our hypotheses. Next, we describe the experiment design, apparatus, procedure, and data processing and analysis to test the hypotheses. This is followed by a presentation and discussion of our results. The paper concludes with a summary of our work and some limitations of this study.

Background and Hypotheses

Effects of Treatments

Previous work (van de Merwe et al., 2024) showed that task complexity increased workload in autonomous vehicle contexts. Accordingly, our first hypothesis is that higher task complexity leads to higher workload (H1). Akash et al. (2020) showed that increasing transparency can increase trust but also increases workload. Thus, we hypothesize that transparency increases trust (H2) and workload (H3). Multiple researchers including Hu et al. (2019) showed that lower reliability led to a decrease in trust. Further, Metzger and Parasuraman (2015) found that reliable automation reduced workload. Accordingly, we hypothesize that lower reliability reduces trust (H4) and higher reliability reduces workload (H5). Finally, we hypothesize that TORs increase workload (H6) and elicit a change in self-confidence (H7).

Inferring Cognitive Factors From Measurements

Previous work (Guo et al., 2016) showed that HRV metrics such as PNN50, RMSSD, SDNN (defined in Table 2) captured changes associated with workload. Accordingly, we expect these metrics to have significant correlation with self-reported (SR) workload. Specifically, Veltman and Gaillard (1998) observed a systematic decrease in NN intervals (inter-beat intervals from which ectopic beats have been removed from consideration) as a task became more difficult during a simulated flight task and likewise, we hypothesize that a negative correlation exists between mean RR interval (time between successive heartbeats) and SR workload (H8). McMahon et al. (2020) found that higher workload was associated with a higher number of Skin Conductance Reaction (SCR) events, so we hypothesize that SR workload positively correlates with number of SCR events (H9). Further, Lu and Sarter (2019) found that total fixation duration and total number of fixations had an inverse relation with trust in the automation, so we hypothesize that a negative correlation exists between number of fixations and SR trust (H10). Finally, multiple studies in literature (Kohn et al., 2021) have established the relationship between trust and reliance—higher trust in automation results in increased reliance. Thus, we hypothesize that a positive correlation exists between reliance and SR trust (H11).

Human Subject Study

Experiment Design

In this between-subject study, each participant drives an ego vehicle with SAE level 3 automation on a predefined route (called a “drive”) in a simulated environment, wherein the independent variables are varied as binary signals. A change in the state of independent variables is referred to as a treatment. To prevent abrupt changes in driving conditions, treatments only occur at intersections. In each half of a drive, two of the four independent variables are varied, and the other two are set to their default states. This design is the simplest way to vary the least number of independent variables while allowing for interactions between independent variables. There are two drives (A and B), and the choice of pairs of independent variables varied in the treatments are hypothesized to perturb all four independent cognitive factors. The treatment summary for each half of each drive is illustrated in Figures 1 to 4.

Figure 1.

Drive A—First half: Automation transparency and task complexity are varied.

Figure 2.

Drive A—Second half: Automation transparency and recommended control mode are varied.

Figure 3.

Drive B—First half: Automation transparency and system reliability are varied.

Figure 4.

Drive B—Second half: Recommended control mode and system reliability are varied.

Treatments occur based on waypoints that are triggered when overlapped with the ego vehicle. Task complexity is increased by introducing construction zone (CZ) obstacles (Figure 5). Automation transparency is turned ON by displaying bounding boxes via augmented reality (AR) for static and dynamic objects, including traffic lights, construction zones, and traffic vehicles, with different colors for different types of objects (Figure 6). Recommended control mode (RCM) in the “Manual” driving state requests the participant to take over the vehicle and drive manually, and in the “Auto” driving state prompts that the automation is ready for regaining control. The first and third TORs (T10 and T15) are initiated in the presence of a school zone, while the second TOR (T13) is initiated without any cause. Lastly, system reliability is decreased by enabling lane deviation (LD) for the ego vehicle’s automation in the presence of faded lane markings.

Figure 5.

Task complexity is implemented using construction zone obstacles.

Figure 6.

Automation transparency is implemented as bounding boxes on static and dynamic objects.

Apparatus

The experiment was conducted with a driving simulator using Unreal Engine 5 and Logitech G29 steering wheel and pedals. A toggle was added to allow participants to take over control of the ego vehicle using the steering wheel or gas/brake pedal. A designated user interface was created to include information such as current speed (in miles per hour), current control mode, availability of automation, directions for navigation, system messages for TORs, prompts for self-reports, etc. The simulator also sent event markers to iMotions (iMotions 10.0.2, 2024) to record changes in independent variables as well as all user-initiated takeovers, pauses and resumes. Heart rate data and GSR data were collected using the Polar H10 sensor (Schaffarczyk et al., 2022) and the Shimmer3 sensor, respectively. Since the driving task required the use of participants’ hands, the Shimmer3 sensor was placed on the instep of the participant’s left foot (Van Dooren et al., 2012). Eye gaze data was collected using the Neon by Pupil Labs eye-tracking glasses.

Participants

A total of 37 participants (16 males, 21 females) aged between 18 and 48 (Mean: 23.30, SD = 6.19) with normal or corrected to normal vision (with contact lenses) participated in the study upon obtaining informed consent. Twenty participants experienced Drive A, while 17 experienced Drive B in this between-subject study. All participants held valid US driver’s licenses and were screened for susceptibility to motion sickness. The study was approved by Purdue’s Institutional Review Board.

Procedure

Participants first completed a pre-experiment questionnaire and were briefed about the experiment. After sensor set-up, participants experienced a trial drive to get acclimated to the simulator controls and the process for providing self-reports (Table 1). At the trial drive’s end, participants were asked for their initial self-reported cognitive factors. Self-reports were solicited at intersections (boxed in Figures 1 –4) by pausing the simulation at a red traffic light or stop sign. After completing the drive, participants completed a post-experiment questionnaire, followed by a semi-structured interview.

Table 1.

Prompts for Soliciting Self-reports on a Likert Scale of 0-100 in Increments of 5. Perceived Risk is Calculated as RS × RL/100.

Cognitive factor	Prompt
Trust	What is your level of trust in the automation?
Workload	How mentally demanding is the task?
Confidence	How confident are you in your capability to drive manually?
Risk Likelihood (RL)	How likely would you think it is for an adverse event (in the driving scenario) to occur?
Risk severity (RS)	How severe would the consequence of that event be?

Data Processing

Psychophysiological Metrics

GSR data was decomposed into phasic and tonic components using non-negative deconvolution implemented by Ledalab software (Benedek & Kaernbach, 2010). The deconvolution process also identifies SCRs as it presumes that the GSR signal is the impulse response of an SCR function convolved with a tonic component. HRV metrics were computed using the MarkusVollmer HRV Matlab Toolbox (Vollmer, 2019). Time domain HRV metrics were calculated over a moving window of 80 NN intervals (Shaffer & Ginsberg, 2017). Based on event-marker timestamps, the continuous-time metrics were partitioned into 20 s pre-and post-stimulus windows over which psychophysiological metrics were computed. See Table 2 for a summary of metrics.

Table 2.

Summary of Psychophysiological Metrics.

Sensor	Metrics
GSR, phasic component	Net mean, geometric mean, and max. value
GSR, SCR	Amplitude, onset time, peak time
Heart rate (HR)	Standard deviation of NN intervals (SDNN)
Heart rate (HR)	Root mean square of successive RR interval differences (RMSSD)
Heart rate (HR)	Percentage of successive RR intervals that differ by more than 50 ms (pNN50)
Heart rate (HR)	Heart rate change, slope

Behavioral Metrics

Reliance was defined to be the fraction of time spent in Auto mode between two treatments. For eye gaze metrics, fixations were classified into the following dynamic areas of interest (AOIs): cars, pedestrians, traffic lights, stop signs, and construction zones. Static AOIs were defined around mirrors, car body, console, vehicle mode, and dashboard horizon. The dashboard horizon AOI was selected because it was observed that many participants tended to rest their eyes on the road directly in front of them at the base of the windshield. Finally, AprilTags were included as an AOI to verify they were not distracting to participants. The classified fixations were used to compute the total fixation time, total number of fixations, maximum fixation time, mean fixation time, and standard deviation of fixation time for all AOI types in experimental blocks pre- and post-stimuli.

Analysis

Statistical Model for Identifying Significant Treatments

The response for the ith participant after the k^th treatment has occurred can be represented using a linear model given by

Y_{i, k} = B_{i} + T_{i, k} + T_{i, k - 1} + β_{i} k

(1)

where Y_i,k denotes the response observed in terms of a particular metric, B_i denotes the participant baseline for that metric, T_i,k denotes the effect size of the current treatment, and T_i,k−1 accordingly denotes the effect of the previous treatment. The term β_ik represents a linear drift. By de-trending the collected observations (Y_i,k), Equation 1 can be rewritten as Y_i,k = B_i + T_i,k + T_i,k−1. Accordingly, the response before the first treatment is simply given by the participant baseline, Y_i,0 = B_i. After the first treatment has occurred, the response is given by Y_i,0 = B_i + T_i,1. It follows that the effect of Treatment 1 using data from the i^th participant is T_i,1 = Y_i,1 − Y_i,0. Similarly, it can be shown that the effect of the k^th treatment is given by T_i,k = Y_i,k − Y_i,k−1. Whether a treatment elicits a measurable change in a metric can be checked by performing a one-sample t-test on the treatment effect collected from participants, or equivalently a paired t-test on the de-trended responses pre- and post-treatment. Further, to account for inter-subject differences, the response in each metric is normalized at a participant level using the standard deviation of the responses for that metric across all treatments.

Correlation

To verify relationships between objective (psychophysiological or behavioral) and subjective (self-reported) measures, the Pearson correlation coefficient was computed for each participant between the objective and subjective metric of interest.

Results

Effects of Treatments Across Participants

H1: Task complexity increased workload. Entering a construction zone for the first time (T2) resulted in an increase in self-reported (SR) workload (t [12] = 3.665 p = .00320), and leaving the construction zone (T7) a second time was associated with a decrease in SR workload (t [12] = −3.295, p = .00640).

H2: Transparency did not cause a significant increase in SR trust or reliance. However, when exiting the construction zone with automation transparency ON (T7 in Drive A), reliance increased (t [18] = 8.121, p = 1.98 × 10⁻⁷).

H3: Transparency did not increase workload.

H4: Decreasing reliability did not affect SR trust. However, decreasing reliability in the absence of automation transparency (T5 in Drive B) led to more fixations on dynamic objects (t [8] = 3.377, p = .00970), suggesting a decrease in trust (Lu & Sarter, 2019).

H5: Two of three treatments involving an increase in reliability (T4, T12 in Drive B) were associated with a decrease in the number of SCR events (t [13] = −2.321, p = 0.0371 respectively), indirectly suggesting a decrease in workload (McMahon et al., 2020).

H6: SR workload increased for the first TOR (T10) in Drive B (t [13] = 3.464, p = .0042). Further, pNN50 (t [17]) = 2.230, p = .0395) and HR range (t [17] = 2.317, p = .0333) increased for the first TOR in Drive A, indicating an increase in workload.

H7: SR confidence decreased when a TOR without cause was issued (T13 in Drive A, t [12] = −2.234, p = .0453, and Drive B, t [13] = −2.470, p = .0281).

Correlation Between Subjective and Objective Measures

H8: SR workload negatively correlated with mean RR intervals for seven participants. SR workload also positively correlated with the slope of heart rate for six participants.

H9: SR workload did not correlate with number of SCR events. However, the maximum value of SCR positively correlated with SR workload for seven participants.

H10: No correlation was observed between SR trust and fixations on dynamic elements.

H11: SR trust positively correlated with reliance for seven participants.

Discussion and Limitations

Seven out of 11 hypotheses tested positive. Importantly, significant changes in psychophysiological and behavioral metrics were observed across participants even when such changes were absent in self-reports. This supports a key feature of our experiment design, namely collecting heterogeneous human data to enable analysis of cognitive factors using a combination of psychophysiological, behavioral, and subjective measures. However, correlations between SR cognitive factors and objective measures were observed only for a few participants. Only seven participants for trust and four participants for perceived risk reported a mean absolute change higher than five (the quantization interval), suggesting a need for re-design of prompts used for self-reports. Analysis of pre- experiment and post-experiment questionnaire data may aid in identifying distinct behaviors and dispositional factors such as automation bias, which is not accounted for in this analysis. Finally, observations such as an (a) increase in reliance when exiting a construction zone with high transparency, and (b) decrease in trust when decreasing reliability in the absence of transparency suggest interaction effects between independent variables that should be analyzed in future work. In the current experimental design, participants in each drive experience the same order of treatments, thereby preventing us from using statistical models that can capture residual or interaction effects due to previous treatments. Therefore, future work will include a re-design of treatments to facilitate such analysis using tools such as linear mixed-models.

Conclusions

In this paper, a novel experiment was presented for eliciting changes in multiple cognitive factors that affect human-decision making during conditionally-automated (SAE Level 3) driving. Multiple heterogeneous dependent measures were collected to infer changes in participants’ trust in the automation, mental workload, self-confidence, and risk perception. While several findings from statistical analyses of the collected data are consistent with previous work, others underscore the complexities associated with a between-subject experiment design aimed at studying human cognitive behavior in scenarios involving their continuous interaction with an automated system under several operating conditions. This motivates further research on this challenging problem.

Footnotes

Acknowledgements

We thank Dr. Robert Proctor for lending his expertise to the experiment design.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This material is based upon work supported by the National Science Foundation under Award No. 2145827. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References

Akash

Jain

Misu

(2020, October 25–29). Toward adaptive trust calibration for level 2 driving automation. Proceedings of the 2020 international conference on multimodal interaction, Netherlands, pp. 538–547. https://doi.org/10.1145/3382507.3418885

Benedek

Kaernbach

(2010). Decomposition of skin conductance data by means of nonnegative deconvolution. Psychophysiology, 47, 647–658. https://doi.org/10.1111/j.1469-8986.2009.00972.x

Guo

Tian

Tan

Zhao

(2016, November 11–13). Driver’s mental workload estimation based on empirical physiological indicators. 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, Hubei Province, China, pp. 344–347. https://doi.org/10.1109/YAC.2016.7804916

W.-L.

Akash

Reid

Jain

(2019). Computational modeling of the dynamics of human trust during human–machine interactions. IEEE Transactions on Human-Machine Systems, 49(6), 485–497. https://doi.org/10.1109/THMS.2018.2874188

iMotions (10.0.2) (2024). iMotions A/S. Copenhagen, Denmark.

Kohn

S. C.

de Visser

E. J.

Wiese

Lee

Y.-C.

Shaw

T. H.

(2021). Measurement of trust in automation: A narrative review and reference guide. Frontiers in Psychology, 12, 4977. https://doi.org/10.3389/fpsyg.2021.604977

Sarter

(2019). Eye tracking: A process-oriented method for inferring trust in automation as a function of priming and system reliability. IEEE Transactions on Human-Machine Systems, 49(6), 560–568. https://doi.org/10.1109/THMS.2019.2930980

McMahon

Akash

Reid

Jain

(2020). On modeling human trust in automation: Identifying distinct dynamics through clustering of Markovian models. IFAC-PapersOnLine, 53(5), 356–363. https://doi.org/10.1016/j.ifacol.2021.04.113

Melcher

Rauh

Diederichs

Widlroither

Bauer

(2015). Take-over requests for automated driving. Procedia Manufacturing, 3, 2867–2873. https://doi.org/10.1016/j.promfg.2015.07.788

10.

Metzger

Parasuraman

(2015). Automation in future air traffic management: Effects of decision aid reliability on controller performance and mental workload. In Decision making in aviation. Routledge.

11.

Riley

(1996). Operator reliance on automation: Theory and data. In Parasuraman

Mouloua

(Eds.), Automation and human performance: Theory and applications (1st edition, pp. 19–35). CRC Press.

12.

Rosenman

Tennekoon

Hill

L. G.

(2011). Measuring bias in self- reported data. International Journal of Behavioural & Healthcare Research, 2(4), 320–332. https://doi.org/10.1504/IJBHR.2011.043414

13.

Schaffarczyk

Rogers

Reer

Gronwald

(2022). Validity of the polar H10 sensor for heart rate variability analysis during resting state and incremental exercise in recreational men and women. Sensors, 22(17), 6536. https://doi.org/10.3390/s22176536

14.

Shaffer

Ginsberg

J. P.

(2017). An overview of heart rate variability metrics and norms. Frontiers in Public Health, 5, 258. https://doi.org/10.3389/fpubh.2017.00258

15.

van de Merwe

Mallam

Nazir

(2024). Agent transparency, situation awareness, mental workload, and operator performance: A systematic literature review. Human Factors, 66(1), 180–208. https://doi.org/10.1177/00187208221077804

16.

Van Dooren

De Vries

J. J. G.

Janssen

J. H

. (2012). Emotional sweating across the body: Comparing 16 different skin conductance measurement locations. Physiology & Behavior, 106(2), 298–304. https://doi.org/10.1016/j.physbeh.2012.01.020

17.

Veltman

J. A.

Gaillard

A. W. K.

(1998). Physiological workload reactions to increasing levels of task difficulty. Ergonomics, 41(5), 656–669. https://doi.org/10.1080/001401398186829

18.

Vollmer

(2019). HRVTool—an open-source Matlab toolbox for analyzing heart rate variability. 2019 Computing in Cardiology (CinC), pp. 1–4. IEEE. https://doi.org/10.22489/CinC.2019.032

A Novel Experiment Design for Studying Multiple Cognitive Factors in Conditionally Automated Driving Contexts

Abstract

Keywords

Introduction

Background and Hypotheses

Effects of Treatments

Inferring Cognitive Factors From Measurements

Human Subject Study

Experiment Design

Apparatus

Participants

Procedure

Data Processing

Psychophysiological Metrics

Behavioral Metrics

Analysis

Statistical Model for Identifying Significant Treatments

Correlation

Results

Effects of Treatments Across Participants

Correlation Between Subjective and Objective Measures

Discussion and Limitations

Conclusions

Footnotes

Acknowledgements

Declaration of Conflicting Interests

Funding

References