Abstract
In a driving simulator study, six experts interacted with an in-vehicle voice assistant (VA) and rated different latencies. The results suggest that in order to maintain drivers’ satisfaction with the interaction, in-vehicle VAs should have a latency of no more than 5 s. A slight delay of 1.5 s was rated best while shorter latencies caused highest variance amongst experts—indicating that the fastest response may not necessarily be the most desirable. Satisfaction may also depend on the complexity of the use case as experts showed higher tolerance toward longer latencies in the navigation-domain. Furthermore, our results raise thought-provoking insights about the importance of considering human expectations and preferences in the design of in-vehicle VAs. Despite technological advancements, humans might still expect a natural delay, similar to that in a human-human interaction. These findings emphasize the need to balance using cutting-edge technology with the desire for familiar interactions.
Keywords
Objective
As technology advances, voice assistants (VA) are becoming more adept at answering and fulfilling user requests. However, with increasing complexity of user requests, the system might still require time to process and fulfill tasks. When the system takes too long to respond, users might consider the system to have failed (Porcheron et al., 2018). Consequently, users still rely on visual feedback to confirm the interaction with voice user interfaces (VUI). This is of particular concern for in-vehicle VA, as they have the potential to enhance driving safety by lowering visual-manual workload while driving (Large et al., 2019). During driving, spatial and visual perception are required; secondary tasks should not further strain the resources already engaged in the driving task (Wickens, 1981, 2002). Akyol et al. (2001) suggest that verbal and auditory modalities are rarely used during driving. Therefore, VAs can be considered suitable for engaging in non-driving-related tasks. To benefit from the advantages of an in-vehicle VA, the VA’s performance must be reliable. If the VA takes too long to respond, drivers might shift their focus aways from the road to check if the VA is working properly. In order to avoid this unwanted shift of focus, identifying an appropriate latency of the VA is crucial. Additionally, an adequate latency can foster trust and acceptance in the system (Deng et al., 2024; Natarajan & Gombolay, 2020), ultimately improving the user experience (UX). Funk et al. (2020) emphasize the importance of timing of in-vehicle VAs stating that both extremes, that is, very short and very long latencies, received the lowest satisfaction ratings. Nielsen (1993) defined 10 s as the limit for keeping the user’s attention focused; thus, latencies may not exceed this upper bound. Porcheron et al. (2018) state that users consider a silence of 4.5 to 5 s following a request as a failure of the system to comprehend them. Conversely, a latency that is too short may result in dissatisfaction. The VA might be cutting off the user which can trigger frustration or negative feelings toward the system (Funk et al., 2020). In summary, balancing the need for prompt responses with the potential for unnaturally short latencies is a challenge when designing VAs.
Given the lack of consensus in the literature on appropriate latencies in the automotive research context, we investigated satisfaction toward different latencies. Within the scope of an expert evaluation, we aimed at defining two latencies while driving: one that serves as a satisfactory latency and another that serves as the upper limit, which must not be exceeded, as users become dissatisfied beyond that point (i.e., VA is taking too long to respond). By defining these two poles, we provide criteria for systematically investigating latencies in the context of in-vehicle VAs. Our findings can have practical implications for both academia and industry, as they can inform the development of best practices and guidelines on appropriate latencies in the driving context.
Approach
We invited n = 6 experts from academia and industry to interact with an in-vehicle VA during a simulated drive in a static driving simulator. The experts engaged with the VA while carrying out three different use cases (UC) from the categories navigation, media, and communication (Table 1). The three UCs provide a comprehensive overview of interaction logic and design of a vehicle’s infotainment system. The experts experienced each UC in 16 latencies ranging from 0.5 to 8 s in steps of 0.5 s. Latency is thereby defined as the time between the end of user’s speech input and the beginning of the VA’s output. The order of UCs and latencies was permuted to counteract sequential effects.
Voice Command by UC-Category.
The experts drove on a highway while the experimenter sat in the passenger’s seat giving instructions about upcoming UCs. Experts activated the VA via the Push-to-talk- button on the steering wheel. The voice commands were printed and visible to the experts throughout the study.
After each interaction, the experimenter asked the expert for satisfaction (“On a scale from 1 (very unsatisfied) to 7 (very satisfied), how satisfied were you with the latency of the voice assistant?”). The experimenter recorded the answers simultaneously. In the event of a draw, whereby an expert assigning identical ratings to multiple latencies, these were repeated at the end of a UC trial. The experts then rated the respective latencies in a direct comparison with the aim of identifying a single most satisfactory latency.
Findings
We determined the median satisfaction for each latency across all six experts and three UCs. Median satisfaction ratings peaked at a latency of 1.5 s and decreased thereafter as latency increased (Table 2). Across all UCs, experts expressed dissatisfaction with a latency of 5 s. We therefore defined 1.5 s as a satisfactory latency and 5 s as the upper limit beyond which dissatisfaction occurred.
Satisfaction Ratings for Each Latency Across All UCs (Median).
Note. Highest ratings and the first lowest rating within each UC domain are in bold.
For the two shortest latencies, that is, 0.5 and 1 s, we observed heterogeneity among the experts. While satisfaction ratings for the long latencies remained rather stable between experts, there was a wide range of ratings among the experts regarding the two shortest latencies with some experts being very satisfied and others very dissatisfied.
We further analyzed satisfaction within each UC domain. Results indicated that within each domain, a latency of 1.5 s received the highest satisfaction (i.e., Md = 7 for navigation, Md = 6.5 for media, Md = 6 for communication). For the media and communication UC, experts expressed dissatisfaction (i.e., Md = 3) after 4.5 s. For the navigation UC, experts rated a latency of 5 s as the first latency to be rather dissatisfying (i.e., Md = 3) (Figure 1). Thus, the results suggest that satisfaction with latency may vary depending on the UC domain, with experts being more tolerant of longer latencies in the navigation domain.

Satisfaction ratings for each latency and UC. The dotted line marks the boundary to a rating of “rather dissatisfied,” that is, Md = 3, or worse.
Conclusion
In a driving simulator study, six experts interacted with an in-vehicle VA and rated different latencies. The results suggest that in order to maintain drivers’ satisfaction with the interaction, in-vehicle VAs should have a latency of no more than 5 s. A slight delay of 1.5 s was rated best across all experts, indicating that the fastest response may not necessarily be the most desirable. Shorter latencies caused the highest variance amongst experts: Some were very satisfied whilst others were very unsatisfied. Satisfaction may also depend on UC-complexity as experts showed higher tolerance toward longer latencies in the navigation-domain. Although the results at hand are primarily and should be considered carefully as they are based on the input of six experts, they can serve as a starting point for future research.
Furthermore, our results raise thought-provoking insights about the importance of considering human expectations and preferences in the design of in-vehicle VAs. Despite advancements in technology, humans might still expect a natural delay—similar to that in a human-human interaction (Doyle et al., 2019), emphasizing the need to strike a balance between leveraging cutting-edge technology and meeting users’ desire for familiar interactions. In their study on response delays, Funk et al. (2020) stated that “timing is everything.” Our results support the importance of timing and further highlight the necessity for further research on the impact of latency on attitudes, such as trust, among real in-vehicle VA users in order to benefit from the promising advantages of such systems.
Supplemental Material
sj-docx-2-pro-10.1177_10711813241260290 – Supplemental material for The Importance of Timing—An Expert Evaluation on Latencies for Voice Assistants
Supplemental material, sj-docx-2-pro-10.1177_10711813241260290 for The Importance of Timing—An Expert Evaluation on Latencies for Voice Assistants by Denise Sogemeier, Yannick Forster, Frederik Naujoks, Josef F. Krems and Andreas Keinath in Proceedings of the Human Factors and Ergonomics Society Annual Meeting
Supplemental Material
sj-docx-3-pro-10.1177_10711813241260290 – Supplemental material for The Importance of Timing—An Expert Evaluation on Latencies for Voice Assistants
Supplemental material, sj-docx-3-pro-10.1177_10711813241260290 for The Importance of Timing—An Expert Evaluation on Latencies for Voice Assistants by Denise Sogemeier, Yannick Forster, Frederik Naujoks, Josef F. Krems and Andreas Keinath in Proceedings of the Human Factors and Ergonomics Society Annual Meeting
Supplemental Material
sj-xlsx-1-pro-10.1177_10711813241260290 – Supplemental material for The Importance of Timing—An Expert Evaluation on Latencies for Voice Assistants
Supplemental material, sj-xlsx-1-pro-10.1177_10711813241260290 for The Importance of Timing—An Expert Evaluation on Latencies for Voice Assistants by Denise Sogemeier, Yannick Forster, Frederik Naujoks, Josef F. Krems and Andreas Keinath in Proceedings of the Human Factors and Ergonomics Society Annual Meeting
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
