Abstract
Up to 67% of accidents at Highway-Rail Grade Crossings (HRGCs) are due to motorists not stopping in time. To reduce the likelihood of vehicle-train accidents, the current study investigated the design of five novel visual and auditory in-vehicle notifications. An “Inform” notification was tested for “display time.” Notifications, “Slow,” “Intersection,” and “Stop” were tested for “display time” and “speech length”; a notification, “Over Tracks” was tested for “speech length” and “presentation order.” Twenty-six participants viewed driving simulator recordings, and rated the notifications at active rail crossing scenarios using Likert scales. The results showed that shorter speech length was preferred with delayed notification displays, and longer speech length was preferred with early notification displays. However, complex scenarios might require longer speech even with delayed displays, depending on additional variables such as driver speed or visibility. Additionally, the tradeoff between notifications being too urgent and startling should also be considered in safety-critical scenarios.
Keywords
Introduction
Accidents at Highway-Rail Grade Crossings (HRGCs) continue to be a worrisome issue, leading to great loss of life, productivity, medical, and legal costs adding up to nearly $40 million between 2008 and 2017 within the US alone (Hellman & Poirier, 2019). Within that same period, a total of 19,639 incidents were reported at different types of HRGCs, and 67% of them were due to motorists not stopping at an active HRGC (Hellman & Poirier, 2019). Driver behavior has long been cited as one of the main contributing factors to train-vehicle collisions (Federal Railroad Administration, 2016). In a recent study, only 20% of drivers reported checking left and right at active crossings, and 40% did so at passive crossings (Lautala et al., 2017). Despite the dangers, drivers have a tendency for risky behaviors such as trying to beat an approaching train. Familiarity with HRGCs cause drivers to be overconfident, and is a contributing factor to vehicle-train accidents (Yeh & Multer, 2008).
Thus, the evidence suggests a need to prevent risky driving behaviors near HRGCs. To improve safety and reduce the likelihood of vehicle-train accidents at HRGCs, the current study investigated the design of novel visual and auditory in-vehicle alerts. Multimodal alerts can be effective in informing drivers to take an appropriate action as they approach an HRGC.
Related Work
In the past, earcon, a short musical sound, Blattner et al., 1989) and speech-based in-vehicle auditory alerts (IVAAs) have shown to improve driver behavior at HGRCs (Nadri et al., 2023). A subjective evaluation study between earcons, speech alerts, and hybrid alerts showed greater ratings for acceptance, safety, and semantics (Nadri et al., 2021). In Nadri et al. (2023), it was seen that a hybrid auditory alert, composed of both earcon and speech, led to significantly lower speeds and greater force applied to the brake pedal when drivers approached HRGCs, compared to when no auditory alert was presented. Drivers were also more likely to view left and right before the HRGC when the alert was presented. Such auditory alerts have also been shown to assist drivers in navigating through low visibility conditions, or urgent scenarios which require high mental workload (Marshall et al., 2007; Nees et al., 2015).
Visual alerts were tested as part of a Rail Crossing Violation Warning (RCVW) system (Neumeister et al., 2017; Withers & Utterback, 2021); this prototype system was developed to display visual warnings to drivers approaching active HRGCs. It also showed positive effects in reducing drivers’ HRGC approach speeds (Zhang et al., 2023). The RCVW system works by detecting the presence of a train near an HRGC, identifying if the grade crossing is activated, and then triggering the roadside unit to communicate with nearby connected vehicles, which then displays a visual alert to approaching drivers (Neumeister et al., 2017). After development of the prototype, the system was tested in a field study where the presence of in-vehicle RCVW alerts led to significantly lower approach speeds in the advance warning segment (Zhang et al., 2023).
In another study, an intelligent warning system was tested at a dual track active HRGC; when the probability of a collision was determined to be “high-risk,” the system would display a visual and auditory alert to the driver’s vehicle with a 30 second arrival time to the HRGC (Wang et al., 2019). However, only two trials were conducted to test the system, and no efforts were made toward the design of the alerts themselves. It was not reported if the system was tested with additional users for evaluating the effectiveness of the alerts (Wang et al., 2019). The need for standardizable in-vehicle alerts that are designed through careful human factors consideration is an area that needs attention (Wullems et al., 2014). Such intelligent in-vehicle alerts should be investigated to quantify warning and reaction times, and to encourage visual scanning considering human attention capability and limitations (Wullems et al., 2014).
The current study makes two contributions. First due to the capability of connected vehicles, auditory alerts can be different depending on the state of the HRGC. Although in-vehicle visual (RCVW) and auditory alerts (IVAA) have been studied individually, both of which have shown promise in improving driver behavior at HRGCs, their combination still requires further research. In the present work, a combination of visual and auditory alerts was developed, and evaluated in a user study for their perceived effectiveness. These ratings provide some insight into combined audio-visual alerts.
Method
Five multimodal notifications were designed and investigated in the current study. The visual content of the notifications was adapted from the RCVW system (Neumeister et al., 2017), and the auditory content for each alert was adapted from past research on the IVAA system (Nadri et al., 2023). The five notifications were labeled “Inform,” “Slow,” “Intersection,” “Stop,” and “Over Tracks.” The visual displays are shown in Figure 1a–e; auditory displays for each can be found here, https://osf.io/qw6bt/?view_only=af3c641489a34d9c8c27da45d0653dbf.

(a)–(e) Visual displays of the notifications.
“Inform” notified drivers about their vicinity to an HRGC. “Slow” asked drivers to slow down at the upcoming HRGC. “Intersection” notified drivers about the presence of an intersection right after an HRGC, and to complete turning only after clearing the HRGC. “Stop” was displayed to drivers who were traveling over the speed limit, and needed to come to a stop quickly. “Over Tracks” was displayed when the driver stopped over the tracks, and needed to clear the tracks immediately. These notifications were displayed at active HRGCs with activated gates, lights, and bells, and in the presence of a train.
Experimental Design
“Inform” was investigated using a single-factor design (Display Time — 20 vs. 40 seconds to arrival at HRGC). “Slow,” “Intersection,” and “Stop” were investigated using a 2 (Display Time — 10 vs. 20 seconds to arrival at HRGC) × 2 (Speech Length — Long vs. Short) design. “Over Tracks” was investigated using a 2 (Speech Length — Long vs. Short) × 2 (Presentation Order — Sequential vs. Simultaneous) design. The experiment followed repeated measures within-subjects design for all notifications. Sequential presentation displayed the earcon, then the speech; Simultaneous presentation included the earcon for the duration of the speech. The audio for “Inform” did not contain any speech, but the audio for “Slow,” “Intersection,” and “Stop” were always presented in a Sequential manner. Videos from the driver’s perspective, approaching an active HRGC in a rural environment, were recorded on a driving simulator. Figure 2 shows the experimental setup, and Figure 3 shows a screenshot from the videos.

Experimental setup.

Video screenshot with notification.
Participants
Twenty-six participants (15 M; 11 F) were recruited for the study. The mean age was 24.8 years (SD = 4.2 years), and on average, encountered a railroad crossing 5.6 times (SD = 10.6) a month. The study was reviewed and approved by the Institutional Review Board (IRB).
Using a 5-point Likert scale (strongly disagree-strongly agree), participants rated each notification display on the following dimensions: Right Timing, Pleasantness, Urgency, Annoyance, Appropriateness, Unambiguity, Attention Capturing, Commanding, Startling, Willingness to Turn Off, and Distracting nature of the notifications.
Procedure
After signing the consent form, participants completed a test drive in the driving simulator during which they drove through two active HRGCs in a two-lane rural environment. The HRGCs were neither activated nor were there any trains present. The purpose of the test drive was to familiarize participants with the simulator environment. Before and after the test drive, participants completed a Simulator Sickness Questionnaire (SSQ) to make sure they did not experience any simulator sickness (Kennedy et al., 1993). Then, from the driving simulator seat, participant viewed all 18 pre-recorded videos of the notifications on the driving simulator screens. Video presentation order was randomized for each participant. After viewing each video, participants provided their ratings.
Results
Multiple repeated measures ANVOA tests were conducted on the Likert scale responses for each of the notifications using SPSS v29.0 for each of the five multimodal alerts.
Inform Warning
Display Time had a significant main effect on Right Timing (F [1, 25] = 4.55; p = .04; ƞp2 = .154), and Unambiguous (F [1, 25] = 5.33; p = .03; ƞp2 = .176) ratings. A Display Time of 20 seconds (M = 3.4; SD = 1.4) before arrival to the HRGC was rated significantly higher for Right Timing than 40 seconds (M = 2.7; SD = 1.6) (p = .04). However, a Display Time of 40 seconds (M = 2.9; SD = 1.4) was rated significantly higher for being Unambiguous than 20 seconds (M = 2.5; SD = 1.3) (p = .03).
Slow Warning
Display Time had a significant main effect on Pleasing (F [1, 25] = 5.87; p = .02; ƞp2 = .19), Willingness to Turn Off (F [1, 25] = 10.15; p = .004; ƞp2 = .289), and Distracting ratings(F [1, 25] = 6.14; p = .02; ƞp2 = .197). Speech Length had a significant main effect on Annoyance (F [1, 25] = 7.01; p = .014; ƞp2 = .219). A Display Time of 10 seconds (M = 3.6; SD = 1.2) was rated significantly more Pleasing than 20 seconds (M = 3.2; SD = 1.1) (p = .023). The 10-second Display Time (M = 2.1; SD = 1.3) was also significantly less likely to be turned off than the 20 second Display Time (M = 2.8; SD = 1.5) (p = .004). The 10 second Display Time (M = 2.3; SD = 1.2) was also considered significantly less Distracting than the 20 second (M = 2.7; SD = 1.2) (p = .02). Lastly, the Long Speech Length (M = 2.8; SD = 1.3) was considered significantly more Annoying than the Short Speech Length (M = 2.2; SD = 1.2) (p = .014).
Intersection
Once again, Display Time had a significant main effect on Pleasing (F [1, 25] = 13.76; p = .001; ƞp2 = .355), with the 10 second display (M = 3.3; SD = 1.1) being rated significantly more Pleasing than the 20 second display (M = 2.8; SD = 1.1) (p = .001). A significant interaction effect was also observed for Display Time and Speech Length on Willingness to Turn Off (F [1, 25] = 4.97; p = .035; ƞp2 = .17), and for being Distracting (F [1, 25] = 11.25; p = .003; ƞp2 = .31). Post hoc paired samples t-tests with Bonferroni correction (α = .05/2 = .025) revealed that with Long Speech, the 20 second Display Time (M = 3.6; SD = 1.4) was also more likely to be turned off than the 10 second (M = 2.9; SD = 1.3) (t (25) = 3.41, p = .002). With Long Speech, the 20 second Display Time (M = 3.1; SD = 1.5) was also considered more Distracting than the 10 second (M = 2.3; SD = 0.9) (t (25) = 2.81, p = .009).
Stop
A significant interaction effect was seen for Display Time and Speech Length for Right Timing (F [1, 25] = 35.87; p < .001; ƞp2 = .589), Pleasing (F [1, 25] = 10.68; p = .003; ƞp2 = .299), Appropriateness (F [1, 25] = 18.06; p < .001; ƞp2 = .419), and Willingness to Turn Off (F [1, 25] = 10.19; p = .004; ƞp2 = .29). Post hoc paired samples t-tests with Bonferroni correction (α = .05/2 = .025) were conducted for the observed interaction effects. With 20 second Display Time, Long Speech (M = 4.1; SD = 0.9) was rated significantly higher than Short Speech (M = 2.0; SD = 1.3) (t (25) = 8.07, p < .001) for Right Timing. Also, with Short Speech, the 10 second Display Time (M = 4.1; SD = 1.1) was also rated significantly higher than 20 second (M = 2.0; SD = 1.3) (t (25) = 5.86, p < .001). In regard to being Pleasing, with 20-second Display Time, Long Speech (M = 3.2; SD = 1.1) was rated significantly higher than Short Speech (M = 2.4; SD = 1.1) (t (25) = 3.24, p = .003). Also, with Short Speech, the 10 second Display Time (M = 3.0; SD = 1.2) was rated significantly higher than 20 second (M = 2.4; SD = 1.1) (t (25) = 2.87, p = .008). For being Appropriate, with 20 second Display Time, Long Speech (M = 4.3; SD = 0.8) was rated significantly higher than Short Speech (M = 3.3; SD = 1.2) (t (25) = 4.2, p < .001). Also, with Short Speech, the 10 second Display Time (M = 4.3; SD = 0.9) was rated significantly higher than 20 second (M = 3.3; SD = 1.2) (t (25) = 3.83, p < .001). Lastly, regarding Willingness to Turn Off, with 20 second Display Time, Short Speech (M = 3.1; SD = 1.5) was significantly more likely to be turned off than Long Speech (M = 2.4; SD = 1.4) (t (25) = 2.77, p = .010). Also, with Short Speech, the 20 second Display Time (M = 3.1; SD = 1.5) was significantly more likely to be turned off than the 10 second Display Time (M = 2.0; SD = 1.3) (t (25) = 3.81, p < .001). In addition, significant main effects were observed for Display Time on being Annoying (F [1, 25] = 11.57; p = .002; ƞp2 = .316), and Distracting (F [1, 25] = 5.30; p = .030; ƞp2 = .175). For being Annoying (M = 2.9 ; SD = 1.3, and M = 3.1; SD = 1.2) (p = .002) and Distracting (M = 2.5; SD = 1.3, and M = 2.7; SD = 1.11)(p = .030), the 20 second Display Time was rated significantly higher than 10 second, respectively.
Over Tracks
Speech Length had a significant main effect on Willingness to Turn Off (F [1, 25] = 4.71; p = .040; ƞp2 = .159), with Long Speech (M = 2.4; SD = 1.5) being significantly more likely to be turned off than Short Speech (M = 2.0; SD = 1.4) (p = .04). Presentation Order also had a significant main effect on being Pleasing (F [1, 25] = 4.34; p = .048; ƞp2 = .148), Urgent (F [1, 25] = 16.44; p < .001; ƞp2 = .397), Commanding (F [1, 25] = 5.73; p = .025; ƞp2 = .186), Startling (F [1, 25] = 7.27; p = .010; ƞp2 = .236), Annoying (F [1, 25] = 12.98; p = .001; ƞp2 = .340), and Distracting (F [1, 25] = 9.47; p = .005; ƞp2 = .275). Sequential was considered significantly more Pleasing (M = 2.6 ; SD = 1.2) than Simultaneous (M = 2.1; SD = 1.2) (p = .048). On the other hand, Simultaneous was considered significantly more Urgent (M = 4.7 ; SD = 0.9, and M = 4.3; SD = 0.8) (p < .001), Commanding (M = 4.6; SD = 0.8, and M = 4.2; SD = 1.0) (p = .025), Startling (M = 3.8 ; SD = 1.3, and M = 3.2; SD = 1.3) (p = .010), Annoying (M = 3.5 ; SD = 1.4, and M = 3.7; SD = 1.4) (p = .001), and Distracting (M = 3.5; SD = 1.4, and M = 2.8; SD = 1.3) (p = .005) than Sequential, respectively.
Discussion
The use of multimodal in-vehicle alerts is a novel approach to improve driver behavior at HRGCs. Three factors influencing their design were investigated: one, Display Time, or how soon before reaching an HRGC the multimodal notification should be displayed; two, Speech Length, or how long the speech content of the notifications should be; and three, Presentation Order, or if earcons are best displayed simultaneously or sequentially with speech. These were investigated for five different multimodal notifications.
It was seen that if notifications were presented closer to the HRGC with shorter speech, they were preferred over notifications presented farther from the HGRC with longer speech. This was observed for notifications Slow and Stop; this was also seen for Inform, but without the effect of Speech Length since Inform did not contain speech. This observation supports the need to present relevant concise information about the immediate surroundings, which in turn helps to maintain Level 1 situation awareness (SA), or perception of the environment (Endsley, 1995). Failure to maintain Level 1 SA is also the most common SA failure, which in turn leads to worse Level 2 and 3 SA (Jones & Endsley, 1996). Loss of Level 1 SA has been attributed to when relevant data are not available, or when presented information was misperceived (Jones & Endsley, 1996). In the current study, by presenting relevant information in a compact form closer to the situation, it was ensured that the important information was made available at the right time, and it reduced likelihood of misunderstanding the conveyed information. A similar observation was observed for the Over Tracks notification too, where shorter speech was preferred.
For complex scenarios such as a clear storage HRGC, where the Intersection notification might be displayed, Long Speech presented closer to the HRGC was preferred. This finding is contrary to notifications Inform, Slow, and Stop, which shows that the use of shorter speech in conjunction with a shorter arrival time, may not always be the best choice, but instead it might depend on the scenario. This dependency on the context is once again similar to the findings of Marshall et al. (2007) that notes the importance of considering environmental factors, or the driving context when designing appropriate auditory alerts. With the Intersection notification, it is likely that participants wanted to receive all possible information about the scenario as they approached the HRGC, but also compare that information with what they could directly observe at the HRGC. Complex scenarios could consist of HRGCs with clear storage and minimum track clearance distances; in such instances, highway traffic queues could extend across a railroad, steep grades could limit visibility, or traffic could consist of school buses or trucks carrying hazardous materials (Ogden & Cooper, 2019). Recommendations have already been made for preemptive traffic signals or pre-signals in such scenarios (Ogden & Cooper, 2019). This corroborates the findings of the Intersection notification.
With the Stop notification, or scenarios where speeding becomes a danger, both combinations of Long Speech presented farther from the HRGC, and Short Speech presented closer to the HRGC were found to be effective. The presence of speech in auditory alerts has been shown to lead to be better memory recollection (Nees et al., 2015), which could suggest that such alerts could have a lasting impression on drivers. It has also been seen that visual alerts require greater effort, and are perceived as less annoying (Nees et al., 2015). It is possible that the visual component of the Stop notification sufficiently alerted participants in the current study, as a result of which they found both combinations of long and short speech presented farther and closer to the HRGC, respectively, to be effective (Nees et al., 2015). However, the main effects of the 20 second Display Time being perceived as more Annoying and Distracting might suggest that participants preferred notifications with shorter speech presented closer to the HRGC. Once again, this resonates the importance of maintaining Level 1 SA, where denser information presented closer to the situation better assists users in perceiving their immediate environment accurately (Jones & Endsley, 1996). Nonetheless, external factors such as visibility, weather conditions, and presence or absence of trains should be investigated in future research.
Finally, in Over Tracks, although Simultaneous presentation was perceived as more Urgent and Commanding than Sequential, it was also rated higher for being Startling, Annoying, and Distracting. In such scenarios, the tradeoff between perceiving a notification as too urgent or too startling, and sufficiently urgent should be considered carefully given the safety-critical nature of such scenarios (Marshall et al., 2001). Following the urgency mapping principle, an increase in perceived urgency should also lead to perceived increase in annoyance (Bliss et al., 1995; Hellier & Edworthy, 1999; Marshall et al., 2007). This was observed in the current study with the Over Tracks scenario. The fact that Simultaneous presentation was perceived as more Startling, Annoying, and Distracting might indicate that it could be considered a less appropriate display compared to Sequential. However, evidence would suggest that it might still be appropriate, based on the reported findings by Marshall et al. (2007), where “appropriateness” and “urgency” were shown to be highly positively correlated (R2 = .89) for more urgent scenarios such as the Over Tracks scenario in the current study. This would suggest that the context in which the alert is displayed is important, especially if it is safety critical, such as the Over Tracks scenario.
Conclusion
Findings from the current study will be used to refine the design of future in-vehicle multimodal alerts for research aimed at improving driver behavior, and potentially reducing the likelihood of vehicle-train accidents at HRGCs. Future research will involve testing improved multimodal notifications in different conditions such as active and passive HGRCs, visibility conditions, and train presence to test for their effectiveness in improving driver behavior.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project is made possible through Federal Railroad Administration, BAA Contract 693JJ622C000015 Multi-Site Simulation to Examine Driver Behavior Impact of Integrated Rail Crossing Violation Warning (RCVW) and In-Vehicle Auditory/Visual Alert (IVAA) System—PHASE 2.
Disclaimer
This document is disseminated under the sponsorship of the U.S. Department of Transportation’s Federal Railroad Administration in the interest of information exchange. The United States Government assumes no liability for its contents or use thereof. The United States Government does not endorse products or manufacturers. Trade or manufacturers’ names appear herein solely because they are considered essential to the objective of this report. The opinions and/or recommendations expressed herein do not necessarily reflect those of the U.S. Department of Transportation.
