Abstract
Objective
We experimentally test the effect of cognitive load on auditory susceptibility during automated driving.
Background
In automated vehicles, auditory alerts are frequently used to request human intervention. To ensure safe operation, human drivers need to be susceptible to auditory information. Previous work found reduced susceptibility during manual driving and in a lesser amount during automated driving. However, in practice, drivers also perform nondriving tasks during automated driving, of which the associated cognitive load may further reduce susceptibility to auditory information. We therefore study the effect of cognitive load during automated driving on auditory susceptibility.
Method
Twenty-four participants were driven in a simulated automated car. Concurrently, they performed a task with two levels of cognitive load:
Results
The fP3 was significantly lower during automated driving with cognitive load compared with without. The difficulty level of the cognitive task (repeat or generate) showed no effect.
Conclusion
Engaging in other tasks during automated driving decreases auditory susceptibility as indicated by a reduced fP3.
Application
Nondriving task can create additional cognitive load. Our study shows that performing such tasks during automated driving reduces the susceptibility for auditory alerts. This can inform designers of semi-automated vehicles (SAE levels 3 and 4), where human intervention might be needed.
Keywords
Introduction
Automation in everyday life is rapidly increasing. Although automation can take away tasks from the human, there are many forms of automation that involve both the human and the system (e.g., Dekker & Woods, 2002; Parasuraman & Riley, 1997; Parasuraman et al., 2000; Sheridan & Verplank, 1978). Such shared control systems require the human operator to be informed of the system state. In the past, these tasks were typically left to skilled, well-trained, professional users such as airplane pilots and control room monitors. However, today more and more automation finds its way to consumer products, which are operated by nonprofessional users who lack extensive training (Janssen, Donker, et al., 2019). Therefore, intuitive design of these systems becomes even more important.
The domain of automated driving is one of the fields that has seen an increasing amount of automation. The Society of Automotive Engineers (SAE) distinguishes six levels of automation in vehicles (SAE International, 2018). These levels differ in tasks that are performed by the driver (human) and tasks that are performed by the vehicle (machine). At SAE levels 3 and 4, the automated vehicle is expected to be able to drive for prolonged time without human intervention (within specific operational design domains). However, at times, the human might be required (SAE level 3) or requested without obligation (SAE level 4) to assist the automation. Although the way in which the car alerts the driver about this assistance can vary between systems, a likely candidate are auditory signals, as these are omnidirectional, already widely applied in cars, and have relatively fast response time across multiple studies of SAE level 2 cars (Zhang et al., 2019).
As humans are expected to continue to play a role in many forms of (semi-)automated driving (Noy et al., 2018), it is important to understand how well the human brain processes auditory alerts in general. Is this general ability, for example, reduced under automated driving conditions? And how is this general ability to process auditory alerts impacted when someone is performing a nondriving task while the automated vehicle is driving without human intervention? We investigate those questions in this paper using a technique from neuroscience, which is described next.
Frontal P3 (fP3) as a Measure of Susceptibility
In this manuscript, we refer to the brain’s general ability to process alerts as
To assess auditory susceptibility, we use the auditory novelty oddball paradigm (for a review, see Polich, 2007), consisting of a stream of at least identical standard tones, mixed with (semi-)unique novels. Concurrent brain activity recording (EEG ERP: electroencephalogram event-related potential) can then be used to quantify the novel-probe-elicited cortical activation (corrected for the standard-elicited activation). The most prominent feature of this novelty-oddball response is the so-called frontal P3 (fP3) response in the ERP: a positive peak over frontal regions (e.g., electrode FCz) around 300 ms after stimulus onset (Allison & Polich, 2008; Squires et al., 1975; Ullsperger et al., 2001), indicating an increase in susceptibility to the stimulus.
The fP3 is a relatively generic response (Friedman et al., 2001; Kenemans, 2015; Polich, 2007; Wessel & Aron, 2013), elicited by any sufficiently salient event. In the current study these are auditory novels, but the salient event can also be visual, emotionally laden, or occasional auditory or visual countermanding signals (see Kenemans, 2015 for examples). In relation to our aforementioned definition of susceptibility, note also that the fP3 as an evolving process has been associated with a direct consequence for behavior, in the sense of behavioral interrupt, or a transient general slowing of the motor system (Kenemans, 2015).
The fP3 has therefore been widely used to index susceptibility in a variety of conditions and tasks, including driving (Wester et al., 2008; van der Heiden et al., 2018), mental fatigue during driving (Massar et al., 2010), manual tracking (Scheer et al., 2016, p. 2018), games (e.g., Allison & Polich, 2008; Miller et al., 2011), arithmetic (e.g., Ullsperger et al., 2001), and during cognitive tasks without visual or manual components (van der Heiden et al., 2020). Susceptibility can also be reduced in other ways that are not tied to a task, such as alcohol (Wester et al., 2010) and passive fatigue (Massar et al., 2010). In other words, the fP3 response is a probe to the more general susceptibility of the brain to external signals. We therefore prefer susceptibility over other, closely related terms such as inattentional deafness (which is tied to auditory stimuli; e.g., Scheer et al., 2018) or attentional reorienting (Corbetta et al., 2008; Corbetta & Shulman, 2002; Schröger & Wolff, 1998) and workload (for a review, see Murphy et al., 2017) (which are tied to even more specific mechanisms). Other perspectives have focused more on potential predictors of reduced susceptibility, such as the EEG alpha-rhythm power (O’Connell et al., 2009), known to greatly increase across hours of monotonous driving (e.g., Schmidt et al., 2009).
For the domain of driving, previous work found a reduction in fP3 response (i.e., indicating a reduction in susceptibility to novel stimuli) under driving and automated driving conditions (Wester et al., 2008; van der Heiden et al., 2018) when compared with a stationary (nondriving) baseline. It has not been explored how performing additional tasks during automated driving (e.g., a telephone call) affects auditory susceptibility. In-vehicle nondriving tasks can take many forms, and their variety is expected to increase with higher levels of automation (e.g., Banks et al., 2018; Carsten et al., 2012; Llaneras et al., 2013; Pfleging et al., 2016). To be able to measure the effects of performing additional tasks during (automated) driving on auditory susceptibility, we need to induce cognitive load in a systematic way.
To this end, we use the verb task (Abdullaev & Posner, 1998; Petersen et al., 1989; Snyder et al., 1995). In this task, participants hear nouns, and either need to
We included both a
Study Aim and Hypotheses
We test how induced additional cognitive load influences general susceptibility to auditory stimuli while people are driven by an automated vehicle. We hypothesize that fP3 is reduced (i.e., indicating a reduced susceptibility to auditory stimuli) when:
Cognitive load is added during automated driving (using either the repeat or the generate task) compared with stationary and automated driving without additional tasks (compare Abdullaev & Posner, 1998; Snyder et al., 1995).
Automated driving is combined with generating a verb compared with automated driving while repeating a noun, as the generating task is hypothesized to create more cognitive load (due to active search within the semantic network; Abdullaev & Posner, 1998; Snyder et al., 1995).
Driving in automated conditions compared with stationary (compare van der Heiden et al., 2018).
Method
Participants
We conducted a power-test in G*power 3.1.9.4. With effect size (d) 0.71 (the effect size for the fP3 peak difference between stationary and automated in van der Heiden et al., 2018), α level of .0125 (the level used in pairwise comparison), and power of 0.8, we required at least 22 participants.
Twenty-four participants (21 F; 3 M) were recruited through on-campus flyers, word of mouth, and advertisement on the participant pool website of the university. Participants were 23 years old on average (ages 18 to 55,
This research complied with the tenets of the Declaration of Helsinki and was approved by the Institutional Review Board at faculty of Social and Behavioral Sciences of Utrecht University (FETC16-042). Informed consent was obtained from each participant. Participants were compensated with either €12 or course credits for their time.
Materials
Driving Simulator
A medium fidelity fixed base driving simulator, based on an original Green Dino 3 screen setup, was used. The setup (Figure 1) included three 40-inch screens and surround sound. OpenDS 4.5 (www.opends.eu) was used as simulator software. The driving environment consisted of a three-lane highway that followed the trajectory of two semicircles, with a radius of 1135.9 m (one clockwise, one counterclockwise). The automated car drove in the middle lane of the highway at 80 km/hr. There were no other cars in the driver’s lane, but cars occasionally drove in the other lanes (left 87 km/hr and right 73 km/hr).

Driving simulator setup with participant wearing 64 electrode EEG cap.
A direct matching to SAE levels is not representative due to the relatively simple driving scenario (with for example, no sudden events), and such a comparison was also not provided to participants. Our scenario is closest to SAE level 4 (SAE International, 2018), in that the driver was not asked for any driving related action (i.e., there were no transitions of control). However, unlike the requirements in SAE level 4, our participants were instructed to sit still and look at the road. Therefore, our results should not be tied to specific SAE levels (as that would require further testing), but rather as an indication of human general susceptibility to sounds during prolonged periods where a driver is being driven by a car and is performing other tasks (in our case: generating verbs or repeating nouns). A driving simulator was used as previous results with fP3 ERP studies in simulated manual driving seem to replicate well in on the road driving (Wester, 2009). In the stationary condition the car stayed stationary at the start location with the engine idle. The other cars, however, still occasionally drove in the other lanes.
Presentation of Auditory Stimuli
Two types of auditory stimuli were used in this experiment: oddball probe stimuli and verb task stimuli. All stimuli were presented using Presentation (Neurobehavioral Systems) at 75 dB trough Earlink earphones.
Oddball probe
We used a two-stimulus novelty oddball probe (van der Heiden et al., 2020). In 75% of cases, the stimuli consisted of a standard sound: a 1000 Hz pure tone of 400 ms. In 25% of cases, the stimuli consisted of novel sounds: environmental sounds such as a dog barking or a human sneezing, that were taken from a database by Fabiani and Friedman (1995). The database consisted of 100 unique sounds that were between 159 ms and 399 ms in duration.
Verb generation and noun repetition task stimuli
Nouns were presented for a verb “generate” task (responding to a noun by saying a related verb) or a noun “repeat” task (repeating the noun); see design. Previous work suggests that the generate task (compared with the repeat task) induces more cognitive load (Abdullaev & Posner, 1998; Snyder et al., 1995), stronger dual-task interference (compare Iqbal et al., 2010; Kunar et al., 2008; Strayer & Johnston, 2001; van der Heiden et al., 2019), and increased activity in the frontal cortex (Abdullaev & Posner, 1998; Bijl et al., 2007). As our aim is to study how the fP3 response changes under automated driving as a function of additional load, we included both a noun repetition and a verb generation version of the task.
For the materials, a set of 96 spoken nouns was used in the verb generation and noun repetition task. In the
Since our participants were Dutch, we used a Dutch translation by van der Heiden et al. (2020) of spoken nouns based on an English set used by Abdullaev and Posner (1998). For the current study, we only used 96 nouns of the 144 words used by van der Heiden et al. (2020), as each block had 32 words (see design), so the total number of words had to be a multiple. The selected 96 words had the fewest errors on trials where participants had to repeat the words in van der Heiden et al. (2020).
As described in more detail in van der Heiden et al. (2020), word selection focused on using words that are familiar to Dutch speakers, and which could be presented in a short time interval. Only Dutch words that had one or two syllables were used. Per word, a WAV sound file was generated using text-to-speech website www.texttospeech.io with default settings of the text-to-speech algorithm (Dutch female, volume 1, rate 1, pitch 1). Nouns of which presentation took longer than 500 ms were removed. For the remaining words, the tempo was adjusted per word, such that each noun had a playback time of exactly 400 ms.
Design
To assess the effect of cognitive load that is added on top of an automated driving condition, we used a single factor within-subjects design with four levels: stationary, automated, automated + repeat, and automated + generate. This allowed us to assess the effect of cognitive load as it comes on top of that of automated driving relative to stationary. Within each block, participants heard both standard tones and novel sounds. The fP3 response is calculated as a difference wave in the event-related potential between standards and novels (see section on signal recording).
Testing Blocks
There were 12 experimental blocks, each about 3 min long. Each experimental condition (e.g., stationary, automated, automated + repeat, and automated + generate) was used in three blocks. Per set of four blocks, all conditions were used. Within that set, the order was varied between participants. For the first four blocks, the order was counterbalanced across participants. For the remaining two sets of four blocks, orders were shuffled such that participants were offered with different orders than before. For example, the first set that participant 1 experienced was: automated without extra task (A), automated + generate (AG), stationary (S), automated + repeat (AR). Subsequently, the order of the second and third block were respectively S, AG, A, AR and S, AR, AG, A.
Within each experimental block, 80 oddball probes were presented. In blocks where automation was combined with verb generation (AG) or noun repetition (AR), there were three types of stimuli: nouns (for the generate or repeat task; each stimulus exactly 400 ms), standards, and novels. To test the effect that the cognitive process associated with verb generation (AG) or noun repetition (AR) had on fP3 response, we carefully balanced when these stimuli were presented in the AG and AR blocks. Specifically, per block, 16 nouns were played immediately preceding a standard oddball probe, 16 immediately preceding a novel oddball, and 48 standards were played without a prior noun presentation. If a probe followed a noun presentation, the next probe was presented 4400 ms after the onset of the preceding oddball stimulus to prevent interference from speech production. On all other trials (where no noun was played, including trials of the S and A blocks), the interval between the onset of two probe stimuli was 2000 ms (compare Wester et al., 2008; van der Heiden et al., 2018).
For the word task, 96 different nouns were used. To vary these between blocks, we made six sets of 32 nouns, three sets for the generate task (containing all 96 unique words, shuffled), and three for the repeat task (again with all 96 words). The order of words within a set was randomized for each participant. In effect, each word was used twice per participant: once in the generate task, and once in the repeat task.
Procedure
Participants received verbal and written information about the experiment and then provided written consent. Next, for the intelligibility test, all nouns were played to the participant, who was tasked to repeat each noun after playback. To validate that all nouns were intelligible, the experimenter in the meantime made notes of nouns that were incorrectly replied to.
The experimenter then applied the EEG electrodes. Participants were then told that they should not hold the steering wheel because the car would drive on its own and manual input would not be needed. A practice block was started where participants performed the verb generation task for 1 min, while they were also driven by the automated vehicle and the oddball probes were used. The participant then performed the 12 experimental blocks, with a few minutes rest after every four blocks. After the experiment, participants were asked to fill out a questionnaire on demographics and general feedback. The total experiment lasted just under 2 hr.
Signal Recording
EEG Setup
EEG was recorded using a BioSemi ActiveTwo system with 64 active Ag-AgCl electrodes positioned following the international 10/10 system (Sharbrough, 1991), and the standard BioSemi CMS/DRL on-line reference, at a sample rate of 2048 Hz. Two electrodes were placed on mastoids, for later re-referencing to average mastoids. Four ocular electrodes were applied to enable offline ocular-artifact control with horizontal and vertical electrooculography (HEOG and VEOG). After measuring the head circumference, a matching EEG cap was applied. Conductive gel was applied, and the corresponding electrodes were plugged in.
Signal analysis was done in BrainVision Analyzer 2.1 (Brain Products GmbH, München, Germany), following similar procedures as in earlier work (van der Heiden et al., 2018, 2020; Wester et al., 2008). We first downsampled the data to 256 Hz (after antialias filter). Data were then re-referenced to average mastoids signal. A high-pass filter of .16 Hz, a low-pass filter of 30 Hz, and a notch filter of 50 Hz were applied. We then created segments for each of the four conditions for both standard and novel probes starting 1000 ms before and ending 1500 ms after oddball probe onset. Before calculating the ERPs, we applied the Gratton & Coles ocular correction to compensate for eye movement during the recorded segments (Gratton et al., 1983). Artifacts in individual channels were rejected by the following criteria in an epoch: maximum voltage step >120 μV/ms within 200 ms before or after events; maximum difference >100 μV within 200 ms; minimum activity <.5 μV within 100 ms. Finally, grand averages were created for each of the conditions. Our analysis focuses on a
To determine the time interval at which the fP3 peak occurred at electrode location FCz, we used a collapsed localizer. The interval 285–335 ms after stimulus onset was found to best represent the fP3 peak area when the ERPs for all four conditions were collapsed. We took the average value in the fP3 interval for statistical peak analysis.
Speech Response Time
To check our cognitive load-inducing task manipulation, we measured speech response time. Based on earlier literature, we would expect that response times are faster when participants merely repeat a noun, compared with when they need to generate a verb (e.g., Iqbal et al., 2010; van der Heiden et al., 2019). However, we would expect that there is no difference whether a noun was preceded by a standard tone or a novel sound. We used a microphone, connected to the auxiliary input of the BioSemi. We used an average level (i.e., calculated using a moving average) of 1000 μV over 15 samples as threshold for speech production. As speech response time we took the interval starting at noun offset (oddball probe onset) and ending at the start of speech production. We excluded the first four participants from this analysis as no microphone was present during that time. We did not record the content of what participants said.
Statistical Analysis
For statistical analysis, we use R statistics (R Core Team, 2014), with an α level of .05. Partial eta-squared is used for effect sizes. For fP3 results, we analyze the difference wave (novel-standard, expressed in μV) using a one-way (omnibus) ANOVA with four levels: stationary, automated, automated + repeat, and automated + generate. For pairwise comparisons, we used planned contrasts with four levels, to compare effects in the order that was expected, namely that extra tasks increase load and reduce fP3. Specifically, whether: (1) automated was lower than stationary, (2) automated + repeat was lower than automated, (3) automated + generate was lower than automated, and (4) automated + generate was lower than automated + repeat. To control for the family-wise error, our criterion for calling a difference significant was α / 4 (i.e., .05/4 = .0125).
For speech-response time (expressed in ms), we use a 2 (Oddball probe: Standard or Novel) × 2 (Cognitive load-inducing task: repeat or generate) ANOVA.
Results
Frontal P3
For each of the four conditions (i.e., stationary, automated, automated + repeat, and automated + generate), we calculated the difference wave of fP3 ERP at electrode FCz (i.e., difference between response to the novel probe and standard probe). Figure 2 shows the fP3 peak, the area of which the mean value was used for statistical analysis is indicated with dashed lines. There was a main effect of condition on the mean fP3 peak activation,

Event-related potential of the four conditions (stationary, automated, automated + generate, automated + repeat). Vertical lines show onset of oddball stimulus (time point 0 ms), noun stimulus (onset at −400 ms in gray), and fP3 peak area used for statistical analysis (285–335 ms).

Scalp maps for various 50-ms time intervals from 25 ms after oddball probe onset to 475 ms after oddball probe onset. Average mastoid is used as reference value.
Speech Response Time
Figure 4 shows the average speech activation level for the different conditions over time, as measured from the point of noun offset and oddball probe onset. As the green line shows that there is no consistent background noise, we dropped all word absent trials for statistical analysis.

Average speech activation level for different conditions; no speech activation is expected when word presentation is absent. Dashed lines show activation for the repeat condition; solid lines show activation for verb generation condition. Red lines show task combined with a standard tone; blue lines show task combined with a novel sound. Note that time point 0 corresponds to noun offset and probe onset. The gray areas indicated when in the trial a noun was presented, and when fP3 peak activation was analyzed in the ERP data (Figure 2).
A 2 (Oddball probe: Standard or Novel) × 2 (Cognitive load-inducing task: repeat or generate) ANOVA showed that there was no main effect of oddball probe
In other words, our manipulation of cognitive load succeeded: responses take longer in the generate condition compared with the repeat condition (compare Iqbal et al., 2010; van der Heiden et al., 2019). There was no effect of the type of oddball stimulus (standard or novel).
Comparison to Manual Driving and Single-Task Verb Generation
This study found that the fP3 peak is reduced when a cognitive load-inducing task is performed during automated driving conditions. For a wider context, we compared our results to those from two previous studies in our lab that were run by the same team, with the same EEG set-up and comparable stimuli (van der Heiden et al., 2018). Figure 5 shows bar diagrams of the average fP3 amplitude of the novel-standard difference wave as observed in this study and as observed in previous studies.

Comparison of amplitudes of fP3 response among three studies: Van der Heiden et al. (2018), Van der Heiden et al. (2020), and the current study. See text for details.
Brief Description of Previous Studies’ Methodology
van der Heiden et al. (2018) manipulated within-subjects whether participants were in a stationary control (watching a screenshot of a road), being driven by an automated vehicle, or driving manually. The driving task was performed in a low-fidelity simulator (Logitech steering wheel and pedals, one screen); the scenario was a trajectory that looped between driving on a regular road, merging onto a highway with other traffic, and unmerging back to the regular road. For the oddball stimuli, the 2018 study used a three-stimulus novelty oddball paradigm, containing standard tones (80% of stimuli; same stimuli as here), novel sounds (10% of stimuli; same stimuli as here), and deviant tones (10% of stimuli; 1100 Hz tones). Apart from the driving manipulations, between subjects the authors manipulated whether participants had to press a button when hearing a deviant tone (active condition) or not (passive condition).
van der Heiden et al. (2020) presented frequent oddball stimuli using a two-stimulus oddball experiment (without deviant; as done here), where 80% of oddball stimuli were standards, and 20% were novels (same stimuli as here). Within each block, some oddball stimuli were not preceded by a noun (baseline control); other oddball stimuli were preceded by a noun with an offset of 0 ms, 200 ms, or 400 ms. Participants always had to respond to a noun by generating a verb. In the 2020 study, no repeat condition was used, and no driving condition was used.
Comparison of Results
In all three studies (van der Heiden et al., 2018; current study), the fP3 response (and associated susceptibility to novel stimuli) is highest in the baseline conditions (in van der Heiden et al., 2018: stationary), with amplitude values around 10–12 μV. The exception is the passive condition of van der Heiden et al. (2018), which has a slightly lower peak value (main effect of active/passive).
In both van der Heiden et al. (2018) and the current study, the condition where there is automated driving without another task lowers the mean fP3, which was significant in the 2018 study but not here (here:
In other words, it seems like a floor effect occurs in three situations: manual driving (van der Heiden et al., 2018), generating verbs (van der Heiden et al., 2020), or combining automated driving with repeating or generating (current study). Another perspective is that the introduction of any concurrent task, irrespective of difficulty and the specific processing demands (either manual driving, repeating words, or generating words), induces costs of such concurrence (Kok, 2001).
General Discussion
This study found that the fP3 peak is reduced when drivers are performing an additional (cognitive load-inducing) task under automated driving conditions. Previous research on the verb task suggests that the generate condition should lead to more cognitive load compared with the repeat condition (compare Iqbal et al., 2010; Kunar et al., 2008; Strayer & Johnston, 2001; van der Heiden et al., 2019). We therefore expected that possibly fP3 response would be lower in the generate (while automated driving) condition compared with the repeat (while automated driving) condition. In contrast to our expectations and previous research, our study did not find a difference between the generate and repeat conditions on fP3 peak. It is unlikely that this reflects that cognitive load is induced by response production; whereas this could hold for repeat, overt responses and therefore preparatory response production processes were much later in generate, and very probably too late to affect the production of the fP3. Rather, the lack of differential fP3 could reflect equal cognitive load in repeat and generate but induced by response production in the former and by semantic search (preceding response production) in the latter.
For the difference between stationary (no task) and automated driving (without additional load-inducing task), the pattern was in the expected direction where fP3 response is highest in the stationary condition (compare van der Heiden et al., 2018). However, we did not statistically replicate the finding that automated driving by itself (i.e., without the addition of a secondary task) causes lower auditory susceptibility, as indicated by a decrease in the fP3 peak, compared with being stationary (van der Heiden et al., 2018). It is conceivable that this difference was less clear in the current study because the context of the verb-generation task induces a general relevance of all auditory stimulation. In a similar vein, the reduction of fP3 when driving compared with when stationary has been reported to disappear when the sequence of probes contains additional stimuli that have to be responded to behaviorally (van der Heiden et al., 2018; Wester et al., 2008 active condition – see also Figure 5).
In the present study, we did discover that performing an additional cognitive task during automated driving reduces susceptibility. This is a relevant finding, given people’s tendency to perform other nondriving tasks in semi-automated driving settings (e.g., Banks et al., 2018; Carsten et al., 2012; Dunn et al., 2019; Llaneras et al., 2013), and the likelihood that auditory signals will be part of alerts in (semi-)automated vehicles to require (SAE level 3) or request (SAE level 4) human assistance. Another way of interpreting these results (compare Figure 5) is that replacing a human task (e.g., driving) through automation frees cognitive resources of the human that allow for higher susceptibility to unexpected resources (i.e., fP3 is higher in automated compared with manual driving conditions). However, in practice, drivers might perform additional tasks (e.g., out of boredom; Dunn et al., 2019). In an irony of automation (Bainbridge, 1983), our results suggest that automating a task could then (through drivers’ engagement in additional tasks) decrease (instead of increase) human susceptibility.
An alternative view is inspired by our analysis of speech data, which revealed a median voice-onset latency of 287 ms during repeat, relative to probe onset (Figure 4). This indicates that a considerable amount of voice response was produced while information was still being sampled from the probe stimulus, or immediately after that. This may have induced a form of (backward) masking that reduced the difference between novel- and standard fP3, perhaps to an extent comparable to that in the generate condition (in which median voice-onset latencies were much later, that is, 680 ms). Further work is needed to see if, and how strongly, the repeat and generate conditions can be differentiated. Or, more generally, how different levels of cognitive load affect fP3 response and associated susceptibility under automated driving conditions.
Our comparison of fP3 magnitude with those observed in previous studies (Figure 5) suggests a floor effect in fP3 response in three situations: manual driving (van der Heiden et al., 2018; see also Wester et al., 2008), generating verbs (van der Heiden et al., 2020), or combining automated driving with repeating or generating (current study). Although automated driving by itself does not necessarily bring susceptibility to the lowest levels, as soon as another task is combined with it (be it some manual driving as in van der Heiden et al., 2018, or a cognitive task), susceptibility is reduced.
Having a low level of susceptibility might be problematic during manual driving as the associated brain process is interpreted to reflect the process of orienting to novel stimuli and the susceptibility to new information (Friedman et al., 2001; Kenemans, 2015; Polich, 2007). So, for example, the ability to orient (and subsequently respond) to an unexpected alert or sound in the driving environment such as a dog running after a ball. A reduced susceptibility is probably even more problematic under automated driving conditions in SAE level 3, where the driver might be engaged in a nondriving task while automation is controlling the vehicle, but where the vehicle can demand human assistance at any time. Our work suggests that under such conditions, humans might have a general reduced susceptibility to alerts. As their prolonged work on a nondriving task might have limited their situational awareness of the driving environment, their ability to act might be reduced.
Although reduced susceptibility may not always lead to failed detection, in an ideal scenario (where alerts are critical), susceptibility should be high. System designers should take this reduced susceptibility into account, and develop strategies to overcome this, for example, by using multi-modal alerts or pre-alerts (Borojeni et al., 2018; Van der Heiden et al., 2017).
A comparable approach to issues of cognitive load and susceptibility during process control has been offered by Strayer and colleagues (e.g., 2013; 2015). In their EEG-based analysis, the focus is on a P3 response over posterior cortical regions (also known as the “P3b” response), which is normally elicited by events that are both relatively rare and task relevant (e.g., targets for a behavioral response such as an emergency brake). The presently used fP3 (sometimes also referred to as “P3a”) is typically elicited by (highly) salient novels without any demand for an overt response. In this way, it provides a continuous yet unobtrusive measure for the susceptibility to potentially critical events that are outside the focus of direct task-associated attention. This is relevant in the context of automated driving, where drivers might occasionally focus on other tasks (e.g., writing an e-mail, handling a phone call) while the automation is handling most of the driving task. In addition, methodologically, the fP3 (or P3a) and P3b seem to differ in their ability to be captured under dynamic driving conditions. Whereas effects observed for the P3b under simulated manual driving did not always replicate under driving conditions in an instrumented vehicle (Strayer et al., 2013, 2015), for the fP3 (or P3a), previous studies did replicate effects between simulated driving and on-the-road driving (see Wester, 2009, chapters 5 and 6).
Limitations and Future Work
Although in the current study, both conditions in which a cognitive load-inducing task is present (i.e., automated + generate and automated + repeat) showed a reduction in fP3 response compared with automated driving and to stationary, we did not find a difference between the two cognitive load-inducing task conditions. This might be due to the timing of our probe; as outlined above, this may have induced masking effects in the repeat condition. One way to avoid this, is to apply a delayed response setting in which voice onsets during repeat are forced to occur much later, although admittedly this could induce undesired working memory load. Another option is to use longer intervals between noun and probe. Our previous study (van der Heiden et al., 2020) showed that this does not affect fP3 during generating verbs, but this may be expected to not hold for repeating nouns (after the voice response fP3 may well recover to a single-task level).
The point in time that we measure is a limitation of our work in general. We probed susceptibility at a fixed interval: 0 ms after presentation of the noun stimulus. This interval was chosen as previous work that involved only the generate task found that extending the interval between stimulus and probe to 200 or 400 ms (i.e., in contrast to directly after) does not influence the level of measured susceptibility (van der Heiden et al., 2020). Future work could also look into the effect over longer time spans, such as 1 s after stimulus offset. It is an open question whether susceptibility is fully restored after the oral response to the verb task (i.e., whether it is a phasic response process), or whether some level of reduced susceptibility remains (i.e., a tonic process).
A limitation of our set-up, in which the generate and repeat task trials are always succeeded by an oddball probe, is that the noun might function as a cue for an oddball probe, and thereby affect fP3 response. This way, the oddball stimulus is more predictable. Moreover, at that time, listening to an auditory sound is behaviorally relevant (because a response to the noun is needed). Previous work suggests that actively engaging in an auditory task at random times (i.e., occasionally pressing a button in response to a specific tone) can increase auditory susceptibility in general (van der Heiden et al., 2018). Therefore, if anything, having a predictable probe might have resulted in relatively higher fP3 activation. If the effect of the cue would be controlled, then even lower levels of fP3 activation might be found in the repeat and generate conditions.
Implications for Practice
Our results show that cognitive load can reduce general susceptibility to alerts. Therefore, it is important for safety-critical systems to take into account the possibility of delayed or absent response from the human operator due to such reduced susceptibility. In the case of automated driving, safety critical alerts such as handover of control requests might therefore build in resilient mechanisms, such as multi-modal alerts, or using earlier “pre-alerts” to forewarn a driver about an upcoming transition of control (Borojeni et al., 2018; Van der Heiden et al., 2017). Future work can look in more detail into the qualities of specific alarm types for different in-car applications.
Key Points
An oddball probe was used to elicit an fP3 ERP to measure the effect of a cognitive load-inducing task during automated driving.
We found that the fP3 is reduced when performing a task that induces cognitive load, either due to load induction by response production, or due to masking in one condition and load induction by semantic search in the other.
The results of this study can be used to inform designers of safety critical systems.
Footnotes
Acknowledgments
Remo van der Heiden was supported by the Dutch Traffic Authority (Rijkswaterstaat). Christian Janssen was supported by a Marie Sklodowska-Curie fellowship of the European Commission (H2020-MSCA-IF-2015, grant agreement 705010, Detect and React). The funders had no role in study design, data collection, analysis, decision to publish, or manuscript preparation. We would like to thank Nina Haukes for her assistance with the data collection. Preliminary results of this work were presented at the Auto-UI 2019 conference during the work-in-progress track (Janssen, van der Heiden, Donker, Kenemans, 2019).
Author Biographies
Remo M. A. van der Heiden obtained his PhD in experimental psychology at Utrecht University (2020). He received his master’s degree in applied cognitive psychology from Utrecht University in 2015.
J. Leon Kenemans is a full professor of biopsychology and human psychopharmacology at Utrecht University. He received his PhD in psychophysiology from Utrecht University in 1990.
Stella F. Donker is an associate professor of experimental psychology at Utrecht University. She received her PhD in movement sciences from the University of Groningen in 2002.
Christian P. Janssen is an assistant professor of experimental psychology at Utrecht University. He received his PhD in human–computer interaction from UCL in 2012.
