Attention Limitations in the Detection and Identification of Alarms in Close Temporal Proximity

Abstract

Objective

The aim of this study was to establish the effects of simultaneous and asynchronous masking on the detection and identification of visual and auditory alarms in close temporal proximity.

Background

In complex and highly coupled systems, malfunctions can trigger numerous alarms within a short period of time. During such alarm floods, operators may fail to detect and identify alarms due to asynchronous and simultaneous masking. To date, the effects of masking on detection and identification have been studied almost exclusively for two alarms during single-task performance. This research examines 1) how masking affects alarm detection and identification in multitask environments and 2) whether those effects increase as a function of the number of alarms.

Method

Two experiments were conducted using a simulation of a drone-based package delivery service. Participants were required to ensure package delivery and respond to visual and auditory alarms associated with eight drones. The alarms were presented at various stimulus onset asynchronies (SOAs). The dependent measures included alarm detection rate, identification accuracy, and response time.

Results

Masking was observed intramodally and cross-modally for visual and auditory alarms. The SOAs at which asynchronous masking occurred were longer than reported in basic research on masking. The effects of asynchronous and, even more so, simultaneous masking became stronger as the number of alarms increased.

Conclusion

Masking can lead to breakdowns in the detection and identification of alarms in close temporal proximity in complex data-rich domains.

Application

The findings from this research provide guidance for the design of alarm systems.

Keywords

Alarm design temporal attention limitations masking multimodal displays dual-task performance

Introduction

In many complex data-rich domains, such as aviation and process control, system safety depends greatly on the timely detection and correct identification of alarms. Alarm detection and identification pose a challenge to operators because the number and degree of coupling of systems in many workplaces have increased significantly. As a result, a single anomaly can trigger a series of related alarms in a very short period of time and thus turn into an alarm flood (Perrow, 2011). An alarm flood is defined as more than 10 alarms in a 10-minute period, but this rate is often exceeded in real-world settings (EEMUA, 1999). For example, in the 1994 Texaco Refinery explosion (Milford Haven, Wales), two operators had to respond to 275 alarms during the last 10.7 minutes of the accident, or approximately one alarm every two seconds. The operators therefore missed critical alarms that could have helped them diagnose and resolve the crisis (HSE, 1997).

The risk of operators missing critical signals during an alarm flood is rooted in perceptual and attentional limitations related to different types of masking. Masking occurs when one stimulus is obscured by the presence of another stimulus (Enns et al., 2000). It can affect two stimuli that are temporally overlapping (e.g. see research on change blindness; Lu, 2014; Rensink, 2002; Simons & Levin, 1997) or contiguous (Breitmeyer & Öğmen, 2006). When the two stimuli are temporally contiguous, masking can affect the detection of the first stimulus (known as backward masking) or the second stimulus (known as forward masking). As an example of forward masking, attentional blinks involve the failure to detect the second of two target stimuli when both are presented in close temporal proximity (200–600 ms apart, according to the literature in attention research). Masking effects were first and primarily studied in the visual modality but more recent studies confirmed their manifestation for auditory stimuli, both intramodally between two auditory stimuli (Arnell & Jolicœur, 1999; Carhart et al., 1969; Doll et al., 1992; Fastl & Zwicker, 2006; Pavani & Turatto, 2008) and cross-modally between visual and auditory stimuli (Arnell, 2006; Arnell & Jolicœur, 1999; Arnell & Larson, 2002; Van der Burg et al., 2010; Van Der Burg et al., 2013). Masking was reported as the most frequent reason (nearly 50% of all cases) for missing an alarm in a survey of industrial operators (from a chemical manufacturer, a confectionery company, and a nuclear power plant; Stanton, 1993).

There is an abundance of basic research on change blindness and attentional blink; however, the contribution of these two phenomena to breakdowns in alarm detection in real-world environments is still unclear. For one, almost all research on masking to date has focused exclusively on the detection and identification of just two stimuli. It is therefore unclear to what extent findings from those studies apply to real-world settings where operators experience significantly larger numbers of alarms. It has been shown that a person’s ability to report multiple concurrent or temporally close targets decreases with an increase in the number of stimuli (Boot et al., 2007; Burr et al., 2010). Other limitations of earlier research on masking include that the vast majority of studies have examined the two phenomena in the visual modality only, and participants in these studies were most often responsible only for detecting target stimuli. In contrast, alarm floods in real-world domains involve exceedingly large numbers of visual and auditory signals, and operators in these fields have to timeshare multiple tasks. Finally, based on limited empirical evidence, the timing at which forward masking is experienced in more complex and demanding environments appears to be longer than suggested by basic research on masking. For example, Ferris et al. (2006) observed a higher miss rate with an SOA (stimulus onset asynchrony) of 1000 ms (compared to an SOA of 500 ms for attentional blink). Establishing the actual SOA range for asynchronous masking of visual and auditory stimuli in complex data-rich environments is important for being able to predict and counteract the failure to detect alarms in real-world settings.

The reported research aimed to address the above limitations in the literature. To this end, two experiments were conducted using an unmanned aerial vehicle (UAV) control simulation. Participants were tasked with acting as supervisory controllers of a commercial drone-based package delivery system. The first experiment aimed to identify the SOAs at which masking effects are observed in a demanding multitask environment. The second experiment had two objectives: (1) to establish the relative contributions of simultaneous and asynchronous masking to missed and misdiagnosed visual and auditory alarms under varying workload (i.e., during routine operations and in an alarm flood) and (2) to investigate the effect of the number and temporal distribution of alarms on alarm detection and identification.

Experiment 1: Establishing the SOA Range for Asynchronous Masking

The goals of this experiment were to understand to what extent masking effects are observed in a demanding multitask environment and at what SOA range they are experienced. Based on the review of previous studies, the following hypotheses were formulated:

• H1-1: Due to masking, when two alarms are presented in close temporal proximity, the detection rate and accuracy for the second alarm will be lower, compared to single alarms that appear in isolation.

• H1-2: The effect described in H1-1 will be observed with an SOA longer than the 200–600 ms range reported in earlier basic research.

• H1-3: The effect described in H1-1 will be observed both intra- and cross-modally for visual and auditory alarms.

• H1-4: Alarm floods will result in lower detection rates, identification accuracy and longer response times to alarms, compared to routine operations.

Method

Participants

The participants in this study were 15 students recruited from the College of Engineering at the University of Michigan. All participants were between 20 and 35 years old (mean = 24.2 years, SD = 3.4 years; 8 males and 7 females). They all reported to have normal or corrected-to-normal vision, normal color vision, and normal hearing ability. This research complied with the American Psychological Association Code of Ethics and was approved by the Institutional Review Board at the University of Michigan (UM IRB: HUM00144319). Informed consent was obtained from each participant.

Apparatus and Tasks

The simulation used in this study was a simulated interface of a ground-based drone control system. It consisted of a computer with a keyboard, an optical mouse, a 23-inch LCD monitor, a pair of stereo speakers, and an audio recorder.

During the experiment, participants were required to perform three tasks: delivery consent, alarm detection and identification, and air traffic control monitoring. Delivery consent was required whenever a drone reached a customer residence. Simulated video feeds from eight drones were presented on the screen (see Figure 1). The drone would follow predetermined flight paths to customer residences. Once it reached a delivery address, it hovered in midair. The participant needed to determine whether the customer had placed a delivery pad (with the letter “H” on it) on the ground. Upon detection of the pad, the participant should press the top button (with a “target” symbol) next to the corresponding drone window to give consent to the delivery of the package. If the pad was not present, the participant needed to click on the second button from the top (showing the same “target” symbol but with a line drawn across it) to cancel the delivery. The drone would then proceed to the next customer without dropping the package.

Figure 1.

Drone simulation interface (displaying a red visual alarm for drone 8; requiring the response “R8”).

Throughout the experiment, visual and auditory alarms were presented whenever there was a temporary link loss between any of the drones and the ground control station. Visual alarms were presented in the central window of the screen, which was divided into a 3x3 grid (see Figure 1). The outer rectangles of the central alarm grid mapped spatially onto the eight drone windows. A visual alarm was displayed as a small red or blue box in the rectangle adjacent to the affected drone (see Figure 1 for an example of an alarm related to drone #8; the color coding is explained below). Auditory alarms were presented using a synthesized male or female voice.

There were two categories of alarms which were defined by the color (for visual alarms) or the gender of the voice (for auditory alarms). Type A alarms included red visual alarms and male voice auditory alarms for drones 1, 3, 6, and 8, as well as blue visual alarms and female voice auditory alarms for drones 2, 4, 7, and 9. Type B alarms included the remaining alarms (see Table 1). The two alarm categories required different responses. For type A alarms, the participants were required to report the first letter of the alarm color or voice (“R” for red visual alarms, “B” for blue visual alarms, “M” for male voice auditory alarms, and “F” for female voice auditory alarms), followed by the number of the affected drone. For type B alarms, the required response was the reverse: participants had to report the number of the affected drone first, followed by the first letter of the color or voice of the alarm. For example, a red visual alarm for drone #8 required a response of “R8,” while a female voice auditory alarm for drone #6 required a response of “6F.” This manipulation was introduced based on pilot testing to ensure that the tasks of identifying and responding to the alarms were sufficiently difficult.

Table 1.

Alarm type (A/B) as a function of drone number, modality, and color/voice.

	Drone number
Alarm type	1	2	3	4	6	7	8	9
Visual red alarm	A	B	A	B	A	B	A	B
Visual blue alarm	B	A	B	A	B	A	B	A
Auditory male voice	A	B	A	B	A	B	A	B
Auditory female voice	B	A	B	A	B	A	B	A

A third task required participants to monitor a continuous audio recording of air traffic control. They were monitoring for three pre-recorded messages containing their call sign “DOI51.” These messages were randomly distributed throughout an actual tower recording. To increase the difficulty of the task, another three messages were added that contained similar information but did not include the call sign “DOI51.” Whenever participants heard the call sign, they were required to press the space bar on the keyboard as quickly as possible. This task was introduced to create resource competition in the auditory channel, similar to the competing demands for visual attention of the delivery consent and the visual alarm monitoring tasks.

Experiment Design and Procedure

Alarms were presented either as single alarms (temporally separated from other alarms by more than 5s) or as alarm pairs, where two alarms were separated by a short SOA (stimuli onset asynchrony) of either 200, 600, 800, 1000, or 1200 ms. The auditory alarms were generated using a TTS service, so they have natural onsets and offsets. An alarm pair consisted of any combination of visual and/or auditory alarms (see Table 2). Also, alarm position was varied (i.e., being a single alarm or the first or second alarm in an alarm pair).

Table 2.

Types of alarms and alarm pairs.

SOA (ms)		Modality (pairs)
Single alarms	N/A	Visual		Auditory
	200
	600
Alarm pairs	800	V+V	V+A	A+V	A+A
	1000
	1200

The experiment consisted of three 20-minute scenarios, named early (alarm) flood, late (alarm) flood, and (alarm) flood only, respectively (see Figure 2). Each scenario included 17 minutes of routine operations and a 3-minute alarm flood. The only difference between the early and late flood scenarios and the flood only scenario was that, in the latter, alarms were presented only during the 3-minute alarm flood. The three scenarios were designed so that participants were unable to anticipate when the alarm flood would occur. In each scenario, the delivery consent task was presented 30 times (about once every 40 seconds). During routine operations in the early and late flood scenarios, 20 of the delivery consent tasks were followed by a single alarm (10 times) or an alarm pair (10 times). In the flood only scenario, no alarms were presented during routine operations. The alarm flood started 5, 10, or 15 minutes into the early flood, flood only, and late flood scenario, respectively. During the 3-minute alarm flood, 30 single alarms and 30 alarm pairs were presented to the participant. The order of alarms was randomized. Air traffic control messages were presented about once per minute throughout the scenario.

Figure 2.

Three scenarios and the distribution of alarms.

Upon arrival at the laboratory, participants were asked to read and sign a printed consent form. Their chair was then adjusted so that their eyes were about 20 inches from the center of the screen. At this distance, the size of the visual alarms equaled 0.125 rad visual angle. Next, participants were given instructions for their three tasks and told to give equal priority to all tasks. They then received four to five 5-minute training sessions to familiarize themselves with the tasks until they reached a combined accuracy of 90% or higher across all three tasks. Following the training sessions, the participant took a short break before starting the three 20-minute experiment sessions. The order of the three sessions was counterbalanced between participants. Before each session, cross-modal matching was performed by adjusting the volume of the auditory alarms to match the intensity of the visual alarms (Colman, 2015; Pitts et al., 2016). The entire experiment lasted about 100 minutes.

Dependent Measures

The dependent measures in this study were detection rate and accuracy for the alarm monitoring task, accuracy and response time for the delivery consent task, and accuracy and response time for the air traffic control monitoring task. The alarm detection rate was defined as the percentage of alarms that the participant responded to, regardless of the accuracy of the response. Accuracy was assessed based on the percentage of correct responses to alarms, out of all responses. For the delivery consent task, accuracy was calculated as the percentage of correct responses (giving consent only when a delivery pad was present and vice versa). Response time was defined as the time from when the drone stopped to when the participant pressed the button. For the air traffic control monitoring task, accuracy was determined based on the percentage of correct responses (only responding when the call sign was announced). Response time was defined as the time from the start of the call sign to the time when the space key was pressed.

Results

The data from this experiment were analyzed using SPSS. Detection rate and accuracy were recorded as binary true/false data and analyzed using the generalized linear mixed model (GLMM, with a binary logit link) in SPSS. Response time was analyzed using the linear mixed model (LMM) in SPSS. For all analyses, the significance level was set at p < .05. Error bars in the various figures represent standard errors.

Since response time for the delivery consent task was not assessed during training, Pearson’s correlation coefficients between the response time to the delivery consent task and the time elapsed since the start of the experiment were calculated for each participant. This served to determine whether participant performance improved throughout the experiment. The maximum R² was 0.05, indicating that there was no learning effect.

Detection Rate for Alarms

To test H1-1 and H1-2, GLMMs were fitted separately on the detection rate of visual and auditory alarms. Fixed effect factors include SOA, alarm position, and their interaction. Participant was set as a random effect factor. The fixed effects are shown in Table 3. Significant main effects were found for SOA for both visual and auditory alarms, whereas alarm position was significant only for auditory alarms. The model coefficients are shown in Table 4, showing that detection rates for visual alarms were lower when the SOA was between 200 and 1000 ms. The same results were observed with auditory alarms, except at an SOA of 600 ms. With respect to alarm position, auditory alarms in second place were more likely to be missed. Figure 3 shows the pairwise comparisons (using Wald’s test) for each SOA*position combination and for single alarms.

Table 3.

GLMM of SOA * position on alarm detection rate—Fixed effects.

	Source	F	df1	df2	p
Visual	Corrected Model	6.960	10	2,224	.000*
	SOA	4.871	4	2,224	.001*
	Position	2.404	1	2,224	.121
	SOA*Position	1.181	4	2,224	.317
Auditory	Corrected Model	13.009	10	2,254	.000*
	SOA	7.324	4	2,254	.000*
	Position	17.714	1	2,254	.000*
	SOA*Position	0.304	4	2,254	.876

Table 4.

GLMM of SOA * position on alarm detection rate—Model estimates (non-significant interaction terms not included).

	Model Term	Coefficient	Std. Error	T	P	95% CI lower	95% CI upper
Visual	Intercept	1.281	1.669	0.768	.443	−1.992	4.555
	SOA = 1200	−0.399	0.215	−1.853	.064	−0.820	0.023
	SOA = 1000	−0.671	0.202	−3.327	.001*	−1.066	−0.275
	SOA = 800	−1.144	0.192	−5.962	.000*	−1.520	−0.768
	SOA = 600	−1.086	0.192	−5.665	.000*	−1.462	−0.710
	SOA = 200	−0.512	0.213	−2.407	.016*	−0.929	−0.095
	Second	−0.017	0.261	−0.065	.948	−0.528	0.494
	First	0
	Single (ref.)	0
Auditory	Intercept	1.780	1.592	1.118	.264	−1.343	4.902
	SOA = 1200	−0.524	0.209	−2.510	.012*	−0.933	−0.115
	SOA = 1000	−0.736	0.210	−3.497	.000*	−1.149	−0.323
	SOA = 800	−1.225	0.211	−5.811	.000*	−1.638	−0.811
	SOA = 600	−0.299	0.235	−1.269	.204	−0.760	0.163
	SOA = 200	−1.110	0.196	−5.657	.000*	−1.495	−0.725
	Second	−0.492	0.251	−1.957	.050*	−0.984	0.001
	First	0
	Single (ref.)	0

Figure 3.

Detection rates for alarms as a function of position and SOA (*p < .05 when compared with single alarms).

Identification Accuracy for Alarms

GLMMs were fitted separately on the identification accuracy for visual and auditory alarms. Fixed effect factors include SOA, alarm position, and their interaction. Participant was set as a random effect factor. The fixed effects are shown in Table 5. SOA and alarm position showed a significant main effect and a significant interaction on identification accuracy for visual, but not for auditory alarms. The model coefficients for visual alarms are shown in Table 6, indicating that accuracy was lower for visual alarms when the SOA was 200, 600, or 1000 ms, but higher for a second visual alarm when the SOA was 600 ms. Figure 4 shows the pairwise comparisons (using Wald’s test) for each SOA*position condition and for single alarms.

Table 5.

GLMM of SOA * position on alarm accuracy—Fixed effects.

	Source	F	df1	df2	p
Visual	Corrected Model	4.703	10	1,490	.000*
	SOA	4.736	4	1,490	.001*
	Position	13.712	1	1,490	.000*
	SOA*Position	3.787	4	1,490	.005*
Auditory	Corrected Model	0.927	10	1,618	.507
	SOA	1.628	4	1,618	.165
	Position	0.034	1	1,618	.854
	SOA*Position	0.189	4	1,618	.944

Table 6.

GLMM of SOA * position on visual alarm accuracy—Model estimates (non-significant interaction terms not included).

	Model Term	Coefficient	Std. Error	T	p	95% CI lower	95% CI upper
Visual	Intercept	1.337	1.470	0.910	0.363	−1.547	4.221
	SOA = 1200	0.281	0.303	0.925	0.355	−0.314	0.875
	SOA = 1000	−0.974	0.235	−4.147	0.000*	−1.435	−0.513
	SOA = 800	−0.119	0.278	−0.427	0.670	−0.663	0.426
	SOA = 600	−0.645	0.248	−2.599	0.009*	−1.131	−0.158
	SOA = 200	−0.550	0.257	−2.14	0.033*	−1.054	−0.046
	Second	0.385	0.328	1.173	0.241	−0.259	1.030
	First	0
	Single (ref.)	0
	SOA = 600*Second	1.277	0.545	2.345	.019*	0.209	2.345

Figure 4.

Accuracy for alarms as a function of position and SOA(*p < .05 when compared with single alarms).

Cross-modal Effects in Alarm Detection and Accuracy

In order to test H1-3, multiple GLMMs were fitted on the detection rate and accuracy of alarms: each GLMM consisted of a paired comparison between two sub-columns in Table 7. For example, to compare the forward masking effect of the first visual versus auditory alarm on the second visual alarm, GLMMs were built on the detection rate and accuracy of the second visual alarms, with the modality of the preceding alarm as the fixed effect factor, and participant as the random effect factor. A significant difference between intra- and cross-modal pairs was observed only when the first alarm was a visual alarm. The detection rate and accuracy for these visual alarms were both lower when they were followed by an auditory alarm, compared to another visual alarm (F(1, 748) = 3.998, p = .046 and F(1, 456) = 4.192, p = .041, respectively).

Table 7.

Detection rate and accuracy for alarms when masked by the same or different modality (*p < .05 when compared with the other modality pair).

The masked alarm	Visual		Auditory
Forward−masked by (when preceded by)	Visual	Auditory	Visual	Auditory
Detection rate (%)	65.9	63.2	56.8	60.0
Accuracy (%)	79.2	85.2	91.1	93.6
Backward-masked by (when followed by)	Visual	Auditory	Visual	Auditory
Detection rate (%)	62.9*	56.0*	69.1	68.5
Accuracy (%)	75.5*	66.8*	93.0	93.0

Alarm Flood Analysis

To test H1-4, GLMMs were fitted on the detection rate and accuracy for all alarms. Workload (routine vs. flood) was set as a fixed effect factor and participant was set as a random effect factor. The overall detection rate for all alarms was significantly lower during alarm floods, compared to routine operations (64.6% vs. 98.7%; F(1, 4498) = 86.352, p < .001). Accuracy was also significantly lower during alarm floods (85.0% vs. 90.1%; F(1, 3128) = 9.066, p = .003; see Figure 5).

Figure 5.

Detection rate and accuracy during alarm floods versus routine periods (*p < .05).

Delivery Consent Task

Across all scenarios, 98.7% responses to the delivery consent task were accurate. However, the response time to the delivery consent task varied as a function of workload (routine vs. flood) and flood scenario (early flood, late flood, flood only). An LMM was fitted to the response time with workload and flood scenario as fixed effect factors, and participant as a random effect factor (see Table 8). Response times were significantly longer during alarm floods, compared to routine operations. There was also a main effect of flood scenario and an interaction effect between workload and scenario, such that response times were shorter during the alarm flood in late flood scenarios, compared to early flood and flood only scenarios (see Figure 6).

Table 8.

LMM of workload * scenario on delivery task response time - Fixed effects.

Source	df1	df2	F	p
Intercept	1	1426	2014.966	.000*
Workload	1	1426	101.808	.000*
Scenario	2	1426	4.900	.008*
Workload * Scenario	2	1426	7.427	.001*

Figure 6.

Response time to delivery consent as a function of workload and scenario. (*p < .05, main effects are not shown in the figure; see the methods section for scenario definitions).

Air Traffic Control Monitoring Task

The detection rate for the air traffic control monitoring task was significantly lower during the alarm floods, compared to the routine periods (14.8% vs. 91.3%, respectively). A simple Chi-square test was performed to assess the difference (χ²(1) = 522.209, p < .001, see Figure 7). Since most of the target messages were missed during the floods, there were insufficient data for an analysis of response time.

Figure 7.

Accuracy for the air traffic control monitoring task as a function of workload and scenario.

Discussion

The current study demonstrated asynchronous masking effects for both visual and auditory alarms with relatively long SOAs between alarm signals in a complex multitasking environment. Detection performance suffered for both the first and second visual and auditory alarms in an alarm pair. In other words, both forward masking (the masking stimulus precedes the target stimulus) and backward masking (the masking stimulus follows the target stimulus) were observed (H1-1 and H1-3 was supported). Previous studies have shown that both forms of masking interrupt the processing of visual stimuli at very short SOAs, ranging from 0–200 ms (Bachmann & Francis, 2013; Breitmeyer & Ogmen, 2000; Eriksen, 1966; Ogmen et al., 2003). At SOAs shorter than 150 ms, backward masking was shown to be more detrimental to noticing a visual stimulus than forward masking (Bachmann & Francis, 2013). In contrast, in the present study, the effects of forward and backward masking were nearly equally strong for visual alarms. This observation may be explained by earlier findings showing that the effect of backward masking decreases faster than forward masking as the SOA increases (Schiller, 1966). Thus, it can be expected that forward and backward masking may have comparable effects when the SOA is as long as 1000 ms (H1-2 was supported). For auditory stimuli, it has been reported that forward masking has a stronger effect than backward masking (Elliott, 1962; Wilson & Carhart, 1971). This was confirmed in the current study.

While the effects of forward and backward masking were comparable for the detection of alarms, identification was affected more strongly by backward masking, such that accuracy was lower for the first visual alarm in an alarm pair. This difference may be attributed to the required responses. In the current study, participants did not need to redirect their gaze when responding and could continue to monitor the central window where the second alarm would appear. Thus, performance for the second alarm benefited once the participant had noticed the appearance of the first alarm and their visual attention was allocated to the central window. However, the first alarm may have suffered as a result of participants’ attention being focused on other tasks/windows at the time. Furthermore, the identification of an alarm required color perception which is rather poor in peripheral vision as it relies on regions of the retina where fewer cones (capable of color discrimination) are located (Wickens et al., 1998). Auditory alarms are omnidirectional; that is, they do not require a particular eye or head orientation; as a result, effects of spatial position were not observed for these alarms.

The comparison of intra- and cross-modal alarm pairs revealed a stronger backward masking effect for auditory alarms following a visual alarm, compared to visual intramodal masking. This effect was not observed with other modality pairs. This finding is consistent with previous research on interruption management which showed that an auditory signal is more effective than a visual one at capturing attention, at the expense of an ongoing visual task—a phenomenon referred to as auditory preemption (Spence & Driver, 2017; Wickens et al., 2005; Wickens & Liu, 1988). The finding could also be explained by the fact that auditory signals have an advantage in terms of storage in short-term memory. A large body of literature suggests that, in an immediate recall task, performance is consistently better with auditory signals, compared to visual ones. This difference is greatest for the most recent item (e.g. Conrad & Hull, 1968). This modality effect has been attributed to a special type of short-term memory called the pre-categorical acoustic storage (PAS), which stores information for the most recent item before recognition; its visual counterpart decays too fast to be utilized in a recall task (Craik, 1969; Crowder & Morton, 1969).

The alarm floods in this study greatly undermined participants’ ability to perform their three tasks (H1-4 was supported). The detection rate for alarms plummeted from 98.7% to 64.6%. Response times to delivery consent requests almost doubled (see Figure 6), and nearly all air traffic control messages with the call sign “DOI51” were missed (see Figure 7). These performance decrements indicate a strong competition for attentional resources. For the delivery consent task, this competition resulted in increased response times because the drone waited until the participant responded. On the other hand, the (auditory) air traffic control messages were transient and had to be noticed immediately. These messages were more likely to be missed than auditory alarms because they were embedded in a stream of very similar (in terms of acoustic characteristics such as pitch and timbre) ATC messages and thus less salient than the auditory alarms (Huang & Elhilali, 2017; Shinn-Cunningham, 2008).

Experiment 2: Establishing the Relative Contributions of Simultaneous and Asynchronous Masking

One of the main goals of this line of research is to examine and support the detection and identification of alarms in an alarm flood. To date, very few studies have examined detection and identification performance for more than two alarms. The only study found by the authors was published by Boot et al. (2007), who studied the detection of up to 4 visual targets in a radar monitoring task, with SOAs ranging from 0 to 300 ms. They reported main effects of both SOA and the number of targets in temporal proximity. Detection performance was worse with more targets and at shorter SOAs. An interaction effect between SOA and the number of targets was also reported, such that the effect of SOA on target detection became more pronounced with an increase in the number of targets. Note that, in that study, participants were asked to report the total number of targets, but they did not have to identify or discriminate between targets. When discrimination is required, a natural question that arises is whether the position of an alarm in a series affects its performance. Such serial effects have been observed in recall tasks where the first and last stimuli showed better performance (Laming, 2010; Murdock, 1962).

The present study went beyond earlier research in that it examined the role of both asynchronous (which was the focus of the two previous experiments) and simultaneous masking (which becomes more likely with the large numbers of alarms in an alarm flood) in the presence of up to 6 alarms and during an alarm flood. The goals of the current study were:

(1) To compare the detection and identification of multiple visual and auditory alarms that are presented concurrently or in close temporal proximity;

(2) To investigate the effect of the number of alarms on their detection and identification;

(3) For alarms that are presented in close temporal proximity (but not simultaneously), to establish the effect of their serial position on their detection and identification.

The following hypotheses were generated based on findings from previous studies:

• H2-1: Alarm detection rate and accuracy will be lower with concurrent alarms than with sequential alarms. This effect is expected to be even more pronounced for auditory alarms.

• H2-2: Alarm detection rate and accuracy will deteriorate as the number of alarms increases. This effect will be stronger for concurrent alarms.

• H2-3: The first and last alarms in a sequence of alarms will have higher accuracy than other alarms in the same sequence.

Method

Participants

The participants in this study were 15 students recruited from the College of Engineering at the University of Michigan. The participants were between 20 to 35 years old (mean age = 23.7 years, SD = 3.1 years; 11 males and 4 females) and had self-reported normal or corrected-to-normal vision. They were also required to have self-reported normal hearing ability and color vision and could not have participated in Experiment 1. This research complied with the American Psychological Association Code of Ethics and was approved by the Institutional Review Board at the University of Michigan (UM IRB: HUM00144319). Informed consent was obtained from each participant.

Apparatus and Tasks

The apparatus used in the current study was similar to that used in Experiment 1, which consisted of a computer with a 24-inch LCD monitor, a pair of stereo speakers, a mouse, a keyboard, and an audio recorder. A set of driving pedals (part of the Logitech^® MOMO^® Racing suite) was added to the setup.

Participants were again required to perform three tasks: delivery consent, alarm monitoring, and air traffic control monitoring. The delivery consent task was the same as in Experiment 1. The alarm monitoring task was different from the one described in Experiment 1 in that it included concurrent alarms. Visual alarms were presented in the center of the screen as red (HSL: 5/94%/45%), green (HSL: 101/94%/45%), or blue (HSL: 204/94%/45%; same saturation and brightness) squares in the small grid cell closest to the affected drone. Auditory alarms consisted of a male, female, or children’s voice stating the number of the affected drone. These colors and voices were chosen to ensure that they were equivalent in salience but also easily differentiable from each other. Upon detection of a visual or auditory alarm, the participants were required to press the space key on the keyboard as soon as possible (to record response time) and then verbally report the type of the alarm (R for red, B for blue, G for green; M for male, F for female, and K for kid), followed by the number of the drone.

The alarms were presented either as single alarms or as groups of multiple alarms (alarm clusters) that appeared either concurrently or closely spaced in time. When a cluster of alarms were presented sequentially, the SOA was set at 800 ms, which was shown to be the most difficult interval for alarm detection in Experiment 1. Table 9 shows all combinations of alarms in this experiment. Note that, for auditory alarms, the number of concurrent alarms was limited to two because extensive pilot testing with various types of auditory signals showed that it was too difficult for participants to detect and differentiate more than two auditory alarms.

Table 9.

Types of alarm clusters.

Number of alarms	Visual		Auditory		Cross-modal
	Concurrent	Sequential	Concurrent	Sequential	Concurrent	Sequential
2	Included	Included	Included	Included	1V+1A	1V1A or 1A1V
4	Included	Included		Included
6	Included	Included		Included

The air traffic control (ATC) monitoring task was also slightly modified. The ATC recording and the call sign “DOI51” were the same as in Experiment 1; however, instead of pressing the space key, participants were required to press the foot pedal as soon as possible. This change was made to avoid confusion between the responses to alarms and call signs.

Experiment Design and Procedure

The various types of alarms and alarm clusters in this study were introduced in the previous section and summarized in Table 5. The number of alarms was either 1, 2, 4, or 6, the cluster type was either concurrent or sequential, and the modality of each alarm was visual or auditory. The experiment also varied workload (routine operation vs. alarm flood) and scenario (early flood, late flood, flood only).

Similar to Experiment 1, the current experiment included three 20-minute scenarios, which were the early flood, late flood, and flood only scenarios (see Figure 2). The early flood and late flood scenarios consisted of 17 minutes of low alarm frequency periods and a 3-minute alarm flood (high alarm frequency). The alarm flood only scenario consisted of 17 minutes without any alarms and the 3-minute alarm flood. A total of 15 alarm clusters or single alarms (including one of each possible alarm cluster listed in Table 9, one single visual alarm, and one single auditory alarm) were presented during the low alarm frequency periods of the early and late flood scenarios. During each alarm flood, a total of 30 alarm clusters or single alarms (two of each possible item) were presented. The frequencies of the delivery consent task and the air traffic control messages were the same as in Experiment 1.

Upon arrival in the laboratory, participants were consented and trained similarly to Experiment 1. After the training, the participants completed three 20-minute scenarios, with optional 5-minute breaks between two scenarios. Cross-modal matching was performed before each session using the same method as described in Experiment 1 (see Pitts et al., 2016). At the end of the experiment, participants completed a debriefing questionnaire asking them to estimate their own task performance, rate task difficulties, and provide other feedback.

Dependent Measures

The dependent measures were the same as in Experiment 1 except for those of the alarm monitoring task. Detection rate was measured at the cluster level, defined as the percentage of alarm clusters that were correctly enumerated. Response time was also measured at the cluster level, as the time from the onset of the alarm cluster/single alarm to the time when the participant first pressed the space key. Accuracy was defined at the alarm level, as the percentage of correctly reported alarms out of all alarms.

Results

Similar to Experiment 1, the data were analyzed using SPSS. Detection rate and accuracy were recorded as binary true/false data and analyzed using the generalized linear mixed model (GLMM, with a binary logit link) in SPSS. Response time was analyzed using the linear mixed model (LMM) in SPSS. For all analyses, the significance level was set at p < .05.

Also, as for the previous experiment, Pearson’s correlation coefficients between the response time to alarms and the time elapsed since the start on the task were calculated for each participant to assess whether performance improved throughout the experiment. The maximum R² was 0.04, indicating that there was no learning effect.

Detection Rate for Alarm Clusters

To test H2-1 and H2-2, the following analyses were conducted: GLMMs were fitted separately on the detection rate for visual and auditory alarm clusters (cross-modal alarm clusters excluded). Fixed effect factors included cluster size, cluster type, and their interaction (interaction was not calculated for auditory clusters, since no more than two concurrent auditory alarms were tested). Participant was set as a random effect factor. The fixed effects are shown in Table 10; the model coefficients are shown in Table 11. Significant main effects were found for cluster size for both visual and auditory alarms. The detection rate for an alarm cluster decreased as the number of alarms in the cluster increased. An interaction effect for cluster size and cluster type was observed such that the detection rate for auditory alarm clusters was significantly lower with concurrent (as opposed to sequential) alarms whereas the detection rate for concurrent visual alarms was lower only when six alarms were presented. Figure 8 shows the pairwise comparisons for each cluster type*size condition and single alarms; the comparison was performed using Wald’s test.

Table 10.

GLMM of cluster size*cluster size on alarm cluster detection rate - Fixed effects.

	Source	F	df1	df2	p
Visual	Corrected Model	10.117	6	833	.000*
	Cluster size	15.648	2	833	.000*
	Cluster type	1.001	1	833	.317
	Size*type	3.312	2	833	.037*
Auditory	Corrected Model	11.432	4	595	.000*
	Cluster size	16.530	1	595	.000*
	Cluster type	6.483	2	595	.002*
	Size*type	N/A	0

Table 11.

GLMM of cluster size*cluster size on alarm cluster detection rate—Model estimates (non-significant interaction terms not included).

	Model Term	Coefficient	Std. Error	t	p	95% CI lower	95% CI upper
Visual	Intercept	4.418	1.242	3.556	.000*	1.979	6.856
	Size = 6	−3.454	0.753	−4.587	.000*	−4.932	−1.976
	Size = 4	−1.485	0.811	−1.830	.068	−3.077	0.108
	Size = 2	−0.966	0.856	−1.128	.260	−2.646	0.715
	Type = Sequential	−0.199	0.632	−0.314	.753	−1.439	1.042
	Type = Concurrent	0
	Single (ref.)	0
	Size = 6*Type = Seq.	1.425	0.727	1.959	.050*	−0.003	2.853
Auditory	Intercept	4.981	1.481	3.362	.001*	2.072	7.891
	Size = 6	−3.297	1.036	−3.182	.002*	−5.332	−1.262
	Size = 4	−2.412	1.060	−2.275	.023*	−4.495	−0.330
	Size = 2	0
	Type = Sequential	−0.000	1.423	−0.000	1.00	−2.794	2.794
	Type = Concurrent	−4.178	1.028	−4.066	.000*	−6.196	−2.160
	Single (ref.)

Figure 8.

Detection rates for alarms as a function of cluster type and size. (*p < .05 compared to single alarms).

Accuracy for Individual Alarms

To test H2-1 and H2-2, GLMMs were fitted separately on the accuracy of visual and auditory alarms (cross-modal alarm clusters were excluded). Fixed effect factors included cluster size, cluster type, and their interaction (again, interaction was not calculated for auditory clusters). Participant was set as a random effect factor. The fixed effects are shown in Table 12; model coefficients are shown in Table 13. Similar to the findings for detection rates, the accuracy for alarms in both visual and auditory alarm clusters decreased as the cluster size increased. Also, an interaction effect was observed showing that the accuracy for auditory alarms was significantly lower with concurrent alarm clusters, an effect observed for visual alarms only for a cluster size of six. Figure 9 shows the pairwise comparisons for each cluster type*size condition and single alarms; the comparison was performed using Wald’s test.

Table 12.

GLMM of cluster size*cluster size on alarm accuracy - Fixed effects.

	Source	F	df1	df2	p
Visual	Corrected Model	41.645	6	2,993	0.000*
	Cluster size	14.544	1	2,993	0.000*
	Cluster type	66.339	2	2,993	0.000*
	Size*type	8.897	2	2,993	0.000*
Auditory	Corrected Model	19.387	4	1,795	0.000*
	Cluster size	57.301	1	1,795	0.000*
	Cluster type	26.023	2	1,795	0.000*
	Size*type	N/A	0

Table 13.

GLMM of cluster size*cluster size on alarm accuracy - Model estimates (non-significant interaction terms not included).

	Model Term	Coefficient	Std. Error	T	p	95% CI lower	95% CI upper
Visual	Intercept	2.916	1.516	1.923	.055	−0.057	5.888
	Size = 6	−2.320	0.244	−9.497	.000*	−2.798	−1.841
	Size = 4	−1.188	0.256	−4.643	.000*	−1.690	−0.687
	Size = 2	0
	Type = Sequential	−0.461	0.433	−1.065	.287	−1.310	0.388
	Type = Concurrent	−0.360	0.437	−0.824	.410	−1.218	0.497
	Single (ref.)	0
	Size = 6*Type = Seq.	1.198	0.341	3.511	.000*	0.529	1.867
Auditory	Intercept	5.258	1.774	2.964	.003*	1.778	8.738
	Size = 6	−2.311	0.323	−7.154	.000*	−2.945	−1.678
	Size = 4	−1.992	0.331	−6.022	.000*	−2.641	−1.343
	Size = 2	0
	Type = Sequential	−1.900	1.057	−1.799	.072	−3.973	0.172
	Type = Concurrent	−4.510	1.024	−4.404	.000*	−6.519	−2.502
	Single (ref.)	0

Figure 9.

Accuracy for alarms as a function of cluster type and size.

To test H2-3, accuracy data were broken down further (see Figures 10 and 11). With a cluster size of 6 visual alarms or 4/6 auditory alarms, the first and last alarms in a cluster were more likely to be correctly identified than the other alarms. To test this effect, GLMMs were fitted on the accuracy for individual visual and auditory alarms in clusters of 6 sequential alarms. The position of an alarm in the cluster was set as a fixed effect factor, and participant was set as a random effect factor. When the first alarm was used as the reference level, the 3rd, 4th, 5th visual alarms, as well as the 2nd to the 5th auditory alarms showed significantly lower accuracy (see Table 14).

Figure 10.

Accuracy for visual alarms as a function of cluster size and alarm position.

Figure 11.

Accuracy for auditory alarms as a function of cluster size and alarm position.

Table 14.

GLMM of alarm position on alarm accuracy - Model estimates.

	Model Term	Coefficient	Std. Error	t	p	95% CI lower	95% CI upper
Visual Size = 6Type = Seq.	Intercept	2.053	1.528	1.343	.180	−0.947	5.053
	Position = 6	−0.070	0.375	−0.188	.851	−0.807	0.666
	Position = 5	−1.127	0.347	−3.248	.001*	−1.808	−0.446
	Position = 4	−1.079	0.348	−3.105	.002*	−1.761	−0.397
	Position = 3	−0.780	0.352	−2.213	.027*	−1.472	−0.088
	Position = 2	−0.330	0.365	−0.905	.366	−1.046	0.386
	Position = 1 (ref.)	0
Auditory Size = 6Type = Seq.	Intercept	2.609	1.607	1.624	.105	−0.546	5.764
	Position = 6	−0.667	0.414	−1.611	.108	−1.480	0.146
	Position = 5	−1.943	0.393	−4.940	.000*	−2.716	−1.171
	Position = 4	−2.235	0.393	−5.686	.000*	−3.007	−1.463
	Position = 3	−1.637	0.395	−4.140	.000*	−2.413	−0.861
	Position = 2	−1.364	0.399	−3.422	.001*	−2.147	−0.582
	Position = 1 (ref.)	0

Response Time for Alarms

LMMs were fitted to the response time for visual and auditory alarm clusters separately (cross-modal alarm clusters were excluded). Fixed effect factors included cluster size, cluster type, and their interaction (again, interaction was not calculated for auditory clusters). Participant was set as a random effect factor. The fixed effects are shown in Table 15; the model coefficients are shown in Table 16 (note that the auditory model is not shown since none of the terms was significant). The response time to visual alarms in a cluster increased significantly with cluster size. Figure 12 shows the pairwise comparison for each cluster type*size condition and single alarms; the comparisons were performed using paired t-tests.

Table 15.

LMM of cluster size*cluster size on alarm response time - Fixed effects.

	Source	F	df1	df2	p
Visual	Corrected Model	1405.508	1	833	.000*
	Cluster size	8.049	2	833	.000*
	Cluster type	0.766	1	833	.382
	Size*type	1.679	2	833	.187
Auditory	Corrected Model	1230.622	1	595	.000*
	Cluster size	2.659	2	595	.071
	Cluster type	2.765	1	595	.097
	Size*type	N/A	0

Table 16.

LMM of cluster size*cluster size on alarm response time - Model estimates (non-significant interaction terms not included).

	Model Term	Coefficient	Std. Error	df	t	p	95% CI lower	95% CI upper
Visual	Intercept	820.975	70.787	833	11.598	.000*	682.033	959.917
	Size = 6	396.267	100.108	833	3.958	.000*	199.773	592.761
	Size = 4	300.733	100.108	833	3.004	.003*	104.239	497.227
	Size = 2	0
	Type = Sequential	193.458	100.108	833	1.932	.054	−3.036	389.952
	Type = Concurrent	97.017	100.108	833	0.969	.333	−99.477	293.511
	Single (ref.)	0

Figure 12.

Response time to alarms as a function of cluster type and size. (*p < .05, compared to single alarms).

Cross-modal Effects for Alarm Detection Rate, Accuracy, and Response Time

There were three different types of cross-modal alarm clusters, all of which consisted of two alarms: concurrent visual-auditory, sequential visual–auditory, or sequential auditory–visual. Neither detection rate, accuracy nor response time differed significantly between the three types of alarm clusters (see Table 17).

Table 17.

Performance comparison between cross-modal alarm clusters.

Cluster type	Modality pair	Detection rate (%)	Accuracy (%)	Response time (ms)
Concurrent	Visual + Auditory	95.8	91.7	1037.3
Sequential	Visual + Auditory	95.0	91.3	944.9
Sequential	Auditory + Visual	94.2	90.0	1008.9

Alarm Flood Analysis

Similar to the alarm flood analysis in experiment 1, GLMMs were fitted on the detection rate for all alarm clusters and the accuracy for all alarms. A GLM was fitted to the response time for all alarm clusters. In all three models, workload (routine vs. flood) was set as a fixed effect factor, and participant was set as a random effect factor. Detection rates did not differ significantly as a function of workload (F(1, 1798) = 1.519, p = .218). However, accuracy was significantly lower during the alarm floods (F(1, 5518) = 23.947, p < .001), and response time was significantly shorter (F(1, 1798) = 13.710, p < .001, see Figure 13).

Figure 13.

Performance comparison between alarm floods and low alarm frequency periods.

Discussion

The current experiment expanded on Experiment 1 which focused on asynchronous masking effects on the detection of single alarms and alarm pairs. In Experiment 2, concurrent alarms were introduced to study the effect of simultaneous masking, and the number of alarms in temporal proximity was increased from two to six. These changes were made to investigate performance in a context more representative of a real-world alarm flood.

The results from this study indicate that the detection rate and accuracy for visual alarms decreased as the number of alarms in a cluster increased. As expected, this performance decrement was more pronounced with concurrent alarms than with sequential ones (H2-2 was supported). This finding is in agreement with the few studies that, to date, have compared synchronous and asynchronous masking. For example, Beanland and Pammer (2012) investigated attentional blink (asynchronous) and inattentional blindness (the failure to notice unexpected visual targets when attention is engaged with other targets; synchronous) in two experiments. Although the tasks in these studies were not entirely equivalent in nature, the authors reported detection rates of 26% for the inattentional blindness task and 62% for the attentional blink task. This suggests a much stronger performance effect of simultaneous masking. A more equivalent comparison was made by Boot et al. (2007) who examined various numbers of targets and SOA’s in the context of a target detection task using a simulated radar monitoring interface. They reported lower detection rates with simultaneous targets than with sequential targets, especially when the number of targets exceeded three. Detection rates in their study were comparable for pairs of targets with SOA’s ranging from 0 to 300 ms. Similarly, in the present study, participants’ performance deteriorated rapidly with an increase in cluster size for concurrent visual alarm clusters; however, their performance was very similar for concurrent and sequential alarm clusters when the cluster size was small (2 alarms). One possible explanation for these findings is that two concurrent visual alarms are processed as one visual stimulus, thus avoiding or reducing competition for attentional resources (Akyürek & Hommel, 2005; Shapiro et al., 2006). For 4-alarm clusters, the detection rate remained similar, but accuracy was lower for concurrent alarms. This result is consistent with findings from studies on enumeration of targets. As mentioned earlier, these studies have shown that it is possible for people to report the number of targets very quickly when that number is no greater than four (Mandler & Shebo, 1982; Revkin et al., 2008). This ability, termed subitizing as opposed to counting, has been likened to the recognition of figural patterns that contain small numbers of targets (Von Glasersfeld, 1982). Since subitizing is based on processing targets as one figural pattern, it does not support the identification and differentiation of multiple alarm signals. As a result, the detection rate for 4-alarm concurrent visual alarm clusters was comparable to that for sequential alarm clusters because the participants could quickly subitize the number of alarms, but the accuracy for 4-alarm concurrent visual alarm clusters was lower than for sequential alarm clusters because subitizing does not support identification.

One difference between the visual and auditory modalities that was observed in this study was that both detection rate and accuracy were higher for auditory alarms in case of 2 alarm clusters but decreased more rapidly with an increase in cluster size for sequential auditory alarm clusters, compared to visual ones (H2-1 was supported). There are two possible explanations for this modality difference. It could result from the smaller capacity of the auditory working memory, compared to visual working memory (Saults & Cowan, 2007), and/or it could be attributed to the increased interference between the auditory presentation of the stimulus and the required verbal response. Generally, there is a tendency toward better performance when stimulus and response are compatible, that is, share the same processing code (Wickens et al., 1984). However, this benefit may disappear due to the temporal overlap between stimulus and response when the number of alarms increases.

The auditory concurrent alarm clusters led to the worst performance—an average detection rate of only 67.5% —amongst all cluster types, even though these clusters consisted of only two alarms. As mentioned before, auditory alarms are more susceptible to simultaneous masking because their detection suffers not only due to attentional limitations, but also because of acoustic interference between different sounds. Unlike visual perception which can attend to spatially independent objects at the same time, auditory perception relies on receptors that are highly sensitive to differences in frequency (Bolton et al., 2018; Greenwood, 1971). The alarm signals used in the current study were close in their base frequency and identical in their duration; as a result, the simultaneous masking effect was very strong. Similar problems could arise in many application domains (such as medical operations) where auditory displays or alarms are widely used but not necessarily well-coordinated as they are associated with systems that are developed by different companies (Momtahan et al., 1993). On the other hand, performance for cross-modal alarm pairs, whether concurrent or sequential, was relatively good, suggesting that the auditory masking effects occurred only intramodally. This interpretation is supported by Multiple Resources Theory (Wickens, 2008).

For both visual and auditory alarms, the response time was longer with larger alarm clusters and also increased more rapidly with concurrent alarm clusters. Because response time was measured at the cluster level, the response time to a cluster is essentially the response time to the first alarm in the cluster. The processing of the following or other concurrent alarm signals delayed the response. This delay was not reported in previous studies as they did not ask participants to respond as quickly as possible (Beanland & Pammer, 2012; Boot et al., 2007). However, this delay is worrisome when viewed not in absolute terms (e.g., 821 ms for single visual alarms vs. 1314 ms for six concurrent visual alarms) but as a 60% increase in response time. In real-world environments where operators multitask more complex tasks, absolute response times tend to be longer, and a 60% increase can result in operationally relevant delays.

For clusters consisting of 4 or 6 sequential alarms, the first and last alarms were reported more accurately than other alarms in the same cluster, thus confirming H2-3. This was true for both visual and auditory alarms. These results are consistent with the pattern observed in free recall tasks, where subjects are presented with a series of items and required to recall the items, regardless of the order of presentation. Items closer to the beginning or the end of the sequence are more likely to be recalled. These effects were termed the primacy effect and the recency effect, respectively (Murdock, 1962). The primacy effect was explained as a result of covert rehearsal of the items in the sequence. The closer an item is to the beginning of the sequence, the more it gets rehearsed and possibly enters long-term memory. Therefore, the first items in a sequence are more likely to be recalled (Fischler et al., 1970; Laming, 2010). The recency effect was attributed to the fact that these items had just entered into short-term memory and were thus more accessible (Tzeng, 1973). There were two ways in which the participants reported a cluster of multiple alarms: report immediately after each alarm or wait until the end to avoid reporting during the cluster. In the first way, the report of one alarm overlapped and interfered with the presentation of the following alarms, thus giving the early alarms an advantage because they were less affected by such interference. In the second way, the early alarms had a similar advantage as those in the free recall studies because they were rehearsed more. The advantage of the last alarm was also comparable to that in the free recall studies because it was the last item to enter the short-term memory and therefore easier to retrieve. Therefore, in larger clusters, attention is not the only resource that limited report accuracy. The capacity of short-term memory could also play an important role in this process.

The comparison between alarm floods and routine operations yielded results that were mostly consistent with those from Experiment 1. During the alarm floods, the detection rate was slightly, but not significantly lower; accuracy was lower, and response times were shorter. These results confirm that participants change their response strategy when exposed to very large numbers of alarms in close temporal proximity, leading to a speed-accuracy tradeoff. To keep up with the pace of the alarm flood, they sped up their responses, but this was achieved at the cost of accuracy. This strategy could lead to potentially catastrophic misdiagnoses of critical alarms during an alarm flood in a real-world setting.

Finally, limitations of the reported studies should be noted. The participants in these experiments were students whose ability to multitask and process complex information may be different from that of well-trained professional operators. Larger sample sizes would be desirable to improve validity and statistical power. Also, some aspects of the tasks may not be representative of real-world operations but had to be adjusted for experimental reasons (e.g., verbal responses were required to reduce the interference between manual responses and visual scanning). Future studies in more representative settings are needed to further investigate these questions.

Conclusion

The goal of this research was to establish whether and to what extent findings from basic research on masking apply to information processing in more complex multitask settings where the phenomenon can result in operators missing critical alarms that appear in close temporal proximity. Two experiments were conducted to measure the detection and identification performance for visual and auditory alarms in the context of a simulated automated package delivery system. The results from Experiment 1 indicated that asynchronous masking was indeed observed for both visual and auditory alarms. An SOA of 800 ms was shown to be particularly detrimental to detection performance. Experiment 2 included simultaneous alarm presentation and alarm clusters that consisted of two, four, or six alarms. The simultaneous presentation of six visual alarms resulted in a much higher risk of missed or misidentified alarms. For auditory alarms, any number of concurrent signals was very difficult to detect and identify. Overall, when compared to routine operations, alarm floods led to lower detection rates, lower accuracy, and shorter response times to alarm signals. Also, performance of the package delivery monitoring tasks suffered during alarm floods: response time to the delivery consent task nearly doubled and over 90% of the ATC messages were missed.

These results confirm that masking is experienced, albeit at longer SOAs, and can lead to breakdowns in alarm detection and identification in complex data-rich domains. To reduce the risk of missed and misinterpreted alarms, intelligent alarm systems should be designed with these insights into human attention limitations in mind. For example, increasing SOAs between closely spaced critical alarms to prevent masking can complement previous efforts which have focused on adjusting the threshold of alarms (the value at which alarms are triggered) based on the context of system operations (Pollard, 2010; Schmid et al., 2017; Welch, 2011). Such systems are “adaptive” to the state of the system but not to the needs of the human operator. Adopting a human-centered approach to alarm design will contribute to increased safety in a range of application domains, such as process control, aviation, and medicine.

Key Points

Alarm floods are likely to happen when malfunctions happen in complex and highly coupled systems. Masking has been known to lead to attention limitations in the detection of visual and auditory signals, yet their effects in complex, multitasking environments remain unclear. Two experiments were conducted using a simulated drone supervisory control system to investigate this issue.

Alarm detection and identification performance were affected when alarms were presented in temporal proximity. Overall speaking, an SOA of 800 ms led to the worse detection performance.

When multiple alarms were presented concurrently, clusters of six visual alarms occurred to be much more difficult to detect and identify, so were any number of auditory alarms. In a cluster of multiple alarms, the first and last alarms were easier to detect and identify.

Alarm systems should be designed with these temporal limitations in mind and avoid displaying alarms in a way that is prone to be missed.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the University of Michigan.

Yuzhi Wan, Tencent Technology, Shanghai—PhD, Industrial and Operations Engineering, 2019, University of Michigan.

Nadine Sarter, Department of Industrial and Operations Engineering, University of Michigan - Ph.D., Industrial and Systems Engineering, 1994, Ohio State University

References

Akyürek

E. G.

Hommel

(2005). Target integration and the attentional blink. Acta Psychologica, 119(3), 305–314. https://doi.org/10.1016/j.actpsy.2005.02.006

Arnell

K. M.

(2006). Visual, auditory, and cross-modality dual-task costs: Electrophysiological evidence for an amodal bottleneck on working memory consolidation. Perception & Psychophysics, 68(3), 447–457. https://doi.org/10.3758/BF03193689

Arnell

K. M.

Jolicœur

(1999). The attentional blink across stimulus modalities: Evidence for central processing limitations. Journal of Experimental Psychology. Human Perception and Performance, 25(3), 630–648. https://doi.org/10.1037/0096-1523.25.3.630

Arnell

K. M.

Larson

J. M.

(2002). Cross-modality attentional blinks without preparatory task-set switching. Psychonomic Bulletin & Review, 9(3), 497–506. https://doi.org/10.3758/BF03196305

Bachmann

Francis

(2013). Visual masking: Studying perception, attention, and consciousness. Academic Press. https://doi.org/10.1016/B978-0-12-800250-6.00001-7

Beanland

Pammer

(2012). Minds on the blink: The relationship between inattentional blindness and attentional blink. Attention, Perception, & Psychophysics, 74(2), 322–330. https://doi.org/10.3758/s13414-011-0241-4

Bolton

M. L.

Edworthy

Boyd

A. D.

(2018). A formal analysis of masking between reserved alarm sounds of the IEC 60601-1-8 International medical alarm standard. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 62(1), 523–527. https://doi.org/10.1177/1541931218621119

Boot

W. R.

Becic

Kramer

A. F.

(2007). Temporal limitations in multiple target detection in a dynamic monitoring task. Human Factors, 49(5), 897–906. https://doi.org/10.1518/001872007X230244.

Breitmeyer

B. G.

Ogmen

(2000). Recent models and findings in visual backward masking: A comparison, review, and update. Perception & Psychophysics, 62(8), 1572–1595. https://doi.org/10.3758/BF03212157

10.

Breitmeyer

B. G.

Öğmen

(2006). Visual masking: Time slices through conscious and unconscious vision (Issue 41). Oxford University Press.

11.

Burr

D. C.

Turi

Anobile

(2010). Subitizing but not estimation of numerosity requires attentional resources. Journal of Vision, 10(6), 20. https://doi.org/10.1167/10.6.20

12.

Carhart

Tillman

T. W.

Greetis

E. S.

(1969). Perceptual masking in multiple sound backgrounds. The Journal of the Acoustical Society of America, 45(3), 694–703. https://doi.org/10.1121/1.1911445

13.

Colman

A. M.

(2015). A dictionary of psychology. Oxford University Press.

14.

Conrad

Hull

A. J.

(1968). Input modality and the serial position curve in short-term memory. Psychonomic Science, 10(4), 135–136. https://doi.org/10.3758/BF03331446

15.

Craik

F. I. M.

(1969). Modality effects in short-term storage. Journal of Verbal Learning and Verbal Behavior, 8(5), 658–664. https://doi.org/10.1016/S0022-5371(69)80119-2

16.

Crowder

R. G.

Morton

(1969). Precategorical acoustic storage (PAS). Perception & Psychophysics, 5(6), 365–373. https://doi.org/10.3758/BF03210660

17.

Doll

T. J.

Hanna

T. E.

Russotti

J. S.

(1992). Masking in three-dimensional auditory displays. Human Factors: The Journal of the Human Factors and Ergonomics Society, 34(3), 255–265. https://doi.org/10.1177/001872089203400301

18.

EEMUA . (1999). Alarm systems: A guide to design, management and procurement. Engineering Equipment and Materials Users Association London.

19.

Elliott

L. L.

(1962). Backward and forward masking of probe tones of different frequencies. The Journal of the Acoustical Society of America, 34(8), 1116–1117. https://doi.org/10.1121/1.1918254

20.

Enns

J. T. J.

Di Lollo

(2000). What’s new in visual masking? Trends in Cognitive Sciences, 4(9), 345–352. https://doi.org/10.1016/S1364-6613(00)01520-5

21.

Eriksen

C. W.

(1966). Temporal luminance summation effects in backward and forward masking. Perception & Psychophysics, 1(2), 87–92. https://doi.org/10.3758/BF03210033

22.

Fastl

Zwicker

(2006). Psychoacoustics: Facts and models. Springer Science & Business Media, Vol. 22.

23.

Ferris

T. K.

Penfold

Hameed

Sarter

(2006). The implications of crossmodal links in attention for the design of multimodal interfaces: A driving simulation study. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 50(3), 406–409. https://doi.org/10.1177/154193120605000341

24.

Fischler

Rundus

Atkinson

R. C.

(1970). Effects of overt rehearsal procedures on free recall. Psychonomic Science, 19(4), 249–250. https://doi.org/10.3758/BF03328801

25.

Greenwood

D. D.

(1971). Aural combination tones and auditory masking. The Journal of the Acoustical Society of America, 50(2B), 502–543. https://doi.org/10.1121/1.1912668

26.

HSE . (1997). The explosion and fires at the Texaco Refinery, Milford Haven, 24 July 1994. HSE books.

27.

Huang

Elhilali

(2017). Auditory salience using natural soundscapes. The Journal of the Acoustical Society of America, 141(3), 2163–2176. https://doi.org/10.1121/1.4979055

28.

Laming

(2010). Serial position curves in free recall. Psychological Review, 117(1), 93–133. https://doi.org/10.1037/a0017839

29.

S. A.

(2014). Tactile and Crossmodal Change Blindness and its Implications for Display Design (Doctoral dissertation).

30.

Mandler

Shebo

B. J.

(1982). Subitizing: An analysis of its component processes. Journal of Experimental Psychology: General, 111(1), 1–22. https://doi.org/10.1037/0096-3445.111.1.1

31.

Momtahan

Hetu

Tansley

(1993). Audibility and identification of auditory alarms in the operating room and intensive care unit. Ergonomics, 36(10), 1159–1176. https://doi.org/10.1080/00140139308967986

32.

Murdock

B. B.

(1962). The serial position effect of free recall. Journal of Experimental Psychology, 64(5), 482–488. https://doi.org/10.1037/h0045106

33.

Ogmen

Breitmeyer

B. G.

Melvin

(2003). The what and where in visual masking. Vision Research, 43(12), 1337–1350. https://doi.org/10.1016/S0042-6989(03)00138-X

34.

Pavani

Turatto

(2008). Change perception in complex auditory scenes. Perception & Psychophysics, 70(4), 619–629. https://doi.org/10.3758/PP.70.4.619

35.

Perrow

(2011). Normal accidents: Living with high risk technologies. Princeton University Press.

36.

Pitts

Riggs

S. L.

Sarter

(2016). Crossmodal matching: A critical but neglected step in multimodal research. IEEE Transactions on Human-Machine Systems, 46(3), 445–450. https://doi.org/10.1109/THMS.2015.2501420

37.

Pollard

K. A.

(2010). Making the most of alarm signals: the adaptive value of individual discrimination in an alarm context. Behavioral Ecology, 22(1), 93–100. https://doi.org/10.1093/beheco/arq179

38.

Rensink

R. A.

(2002). Change Detection. Annual Review of Psychology, 53(1), 245–277. https://doi.org/10.1146/annurev.psych.53.100901.135125.

39.

Revkin

S. K.

Piazza

Izard

Cohen

Dehaene

(2008). Does subitizing reflect numerical estimation? Psychological Science, 19(6), 607–614. https://doi.org/10.1111/j.1467-9280.2008.02130.x

40.

Saults

J. S.

Cowan

(2007). A central capacity limit to the simultaneous storage of visual and auditory arrays in working memory. Journal of Experimental Psychology: General, 136(4), 663–684. https://doi.org/10.1037/0096-3445.136.4.663

41.

Schiller

P. H.

(1966). Forward and backward masking as a function of relative overlap and intensity of test and masking stimuli. Perception & Psychophysics, 1(3), 161–164. https://doi.org/10.3758/BF03210050

42.

Schmid

Goepfert

M. S.

Franz

Laule

Reiter

Goetz

A. E.

Reuter

D. A.

(2017). Reduction of clinically irrelevant alarms in patient monitoring by adaptive time delays. Journal of Clinical Monitoring and Computing, 31(1), 213–219. https://doi.org/10.1007/s10877-015-9808-2

43.

Shapiro

Schmitz

Martens

Hommel

Schnitzler

(2006). Resource sharing in the attentional blink. Neuroreport, 17(2), 163–166. https://doi.org/10.1097/01.wnr.0000195670.37892.1a

44.

Shinn-Cunningham

B. G.

(2008). Object-based auditory and visual attention. Trends in Cognitive Sciences, 12(5), 182–186. https://doi.org/10.1016/j.tics.2008.02.003

45.

Simons

D. J.

Levin

D. T.

(1997). Change blindness. Trends in Cognitive Sciences, 1(7), 261–267. https://doi.org/10.1016/S1364-6613(97)01080-2.

46.

Spence

Driver

(2017). Audiovisual links in attention: Implications for interface design. In Engineering psychology and cognitive ergonomics (pp. 185–192). Routledge.

47.

Stanton

N. A.

(1993). Operators reactions to alarms: Fundamental similarities and situational differences. In Proceedings of the Conference on Human Factors in Nuclear Safety (pp. 84–103), 22–23 April 1993. London: Le Meridien Hotel. https://doi.org/10.1201/9780203481974-9

48.

Tzeng

O. J. L.

(1973). Positive recency effect in a delayed free recall. Journal of Verbal Learning and Verbal Behavior, 12(4), 436–439. https://doi.org/10.1016/S0022-5371(73)80023-4

49.

Van der Burg

Brederoo

S. G.

Nieuwenstein

M. R.

Theeuwes

Olivers

C. N. L.

(2010). Audiovisual semantic interference and attention: Evidence from the attentional blink paradigm. Acta Psychologica, 134(2), 198–205. https://doi.org/10.1016/j.actpsy.2010.01.010

50.

Van Der Burg

Nieuwenstein

M. R.

Theeuwes

Olivers

C. N. L.

(2013). Irrelevant auditory and visual events induce a visual attentional blink. Experimental Psychology, 60(2), 80–89. https://doi.org/10.1027/1618-3169/a000174

51.

Von Glasersfeld

(1982). Subitizing: The role of figural patterns in the development of numerical concepts. Archives de Psychologie, 50(194), 191–218.

52.

Welch

(2011). An evidence-based approach to reduce nuisance alarms and alarm fatigue. Biomedical Instrumentation & Technology, 45(s1), 46–52. https://doi.org/10.2345/0899-8205-45.s1.46

53.

Wickens

C. D.

(2008). Multiple resources and mental workload. Human Factors: The Journal of the Human Factors and Ergonomics Society, 50(3), 449–455. https://doi.org/10.1518/001872008X288394

54.

Wickens

C. D.

Dixon

S. R.

Seppelt

(2005). Auditory preemption versus multiple resources: Who wins in interruption management? Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 49(3), 463–466. https://doi.org/10.1177/154193120504900353

55.

Wickens

C. D.

Gordon

S. E.

Liu

(1998). An introduction to human factors engineering. Longman.

56.

Wickens

C. D.

Liu

(1988). Codes and modalities in multiple resources: A success and a qualification. Human Factors, 30(5), 599–616. https://doi.org/10.1177/001872088803000505

57.

Wickens

C. D.

Vidulich

Sandry-Garza

(1984). Principles of S-C-R compatibility with spatial and verbal tasks: The role of display-control location and voice-interactive display-control interfacing. Human Factors: The Journal of the Human Factors and Ergonomics Society, 26(5), 533–543. https://doi.org/10.1177/001872088402600505

58.

Wilson

R. H.

Carhart

(1971). Forward and backward masking: Interactions and additivity. The Journal of the Acoustical Society of America, 49(4B), 1254–1263. https://doi.org/10.1121/1.1912488