Sage Journals: Discover world-class research

Abstract

This study examined real-world driver-pedestrian encounters to identify key interaction features and assess how the importance of these features is mediated by protection afforded by the environment. Using inverse reinforcement learning, we estimated the utility functions to evaluate the relative importance of different aspects of the interaction for each road user and how they differ between undesignated (e.g., jaywalking) and designated (e.g., zebra crossings) crossings. Pedestrian pausing behavior and dynamic features like acceleration changes and time gaps were important at designated crossings, whereas undesignated crossings relied on distances and bidirectional gaze, highlighting reliance on non-verbal cues. This work builds on previous studies analyzing the role of environmental features on interaction, communication, and negotiation between drivers and pedestrians. Understanding driver-pedestrian communication and identifying the most important interaction features may enhance the design of effective and coordinated driver-pedestrian interaction strategies, especially in the context of automated driving systems.

Keywords

driver pedestrian inverse reinforcement learning naturalistic interaction communication

Introduction

Road users communicate using various channels, including both implicit and explicit cues (Domeyer et al., 2019). However, prior work suggests that the vast majority of the communication between road users is in fact implicit (Lee et al., 2021). Studies have found differences in these implicit cues given the roadway context between designated (e.g., stop signs and zebra crossings) and undesignated crossing environments (e.g., jaywalking). Designated crossings were associated with greater and more prolonged mutual kinematic information exchange between drivers and pedestrians compared to undesignated crossing environments (Noonan et al., 2023). Undesignated crossings have been associated with greater interdependence between driver kinematics (e.g., variation in speed) and pedestrian wait times as well as pedestrian kinematics and driver wait times. (Domeyer et al., 2020; Noonan et al., 2022). In a recent study, Pipkorn et al. (2025) also found differences in communication patterns across different environments, showing that undesignated and designated crossings had a greater prevalence of bidirectional gaze behavior (i.e., instances in which the driver and pedestrian look at one another) compared to protected crossings. An outstanding question is: What are the key features (e.g., kinematics, temporal features, and bidirectional gazing) of these driver-pedestrian interactions, and how does their relative importance vary between drivers and pedestrians across different types of crossings?

As vehicles with advanced automation capabilities and more contextually aware systems are introduced, the ability to interpret and respond to subtle human signals may help to improve driver-pedestrian coordination and enhance capabilities for proactive decisions. Understanding how drivers and pedestrians coordinate and communicate non-verbally within a specific roadway context may support the development of intuitive and reliable communication.

The current study uses inverse reinforcement learning (IRL) as a framework to identify the key features underlying driver and pedestrian preferences and priorities that guide their behavior without requiring those preferences to be explicitly stated. By learning from observed actions in real-world traffic, IRL makes it possible to deduce the implicit decision-making features that drivers and pedestrians may use during their interaction.

The goal of this study is to estimate, through observation of real-world road user behavior, the features of driver-pedestrian interactions that may be preferred given the context of the interaction vis-à-vis the level of protection afforded by the infrastructure. One way of quantifying the importance of a particular action or behavior for a driver-pedestrian interaction is through utility (von Neumann et al., 1944). IRL was used to estimate the importance (i.e., contribution to the utility) of different features (i.e., behaviors or actions) to either the driver or pedestrian behaviors, dependent upon the type of crossing. These estimated utilities were then compared between crossing types and interaction partners to understand how they differ across categories.

Methods

We used inverse reinforcement learning (IRL) (Ng & Russel, 2000; Russel, 1998) to quantify utilities using naturalistic driving data as a foundation. The key idea of IRL is that, given observed behaviors in a variety of circumstances, it is possible to estimate the reward function that is being optimized by those behaviors. The power of this approach is that it can provide this reward function without inferring the policy (i.e., the mapping of behavior to reward) and is well-suited for complex and dynamic environments. Specifically, maximum entropy IRL is a way of estimating a reward function as a linear weighting of single values associated with features and correlated to their visitation frequency (Ziebart et al., 2008). The general approach of this study is to take expert trajectories observed from naturalistic encounters between drivers and pedestrians in two different environments: designated and undesignated encounters, and then infer a reward function using maximum entropy IRL. Then, a post-hoc statistical analysis was performed based on the IRL output.

Data Collection

Data were drawn from the ongoing MIT-AVT naturalistic driving data collection effort (Fridman et al., 2019). Vehicle-pedestrian interactions (i.e., road crossing) were initially detected using computer vision and then verified by trained behavioral coders. In total, 216 epochs were identified: 145 designated crossings and 71 undesignated crossings. Vehicle kinematic and video data were collected using an in-vehicle recording system. The kinematic state of the pedestrian (walking, standing, or pausing) and the gaze direction of the driver and pedestrian were manually annotated (see Noonan et al., 2022).

Feature Engineering

To capture the relevant factors influencing driver and pedestrian behavior, a number of features were engineered. Each feature is a mapping from the multidimensional state-space to a single real-valued number, ensuring the reward function associated with these features is correlated to their visitation frequency (Ziebart et al., 2008). The features specific to the driver were:

Velocity - representing the relative progress of the driver

f_{v e l} = \frac{1}{N} \sum_{i}^{N} v_{i}

(1)

Acceleration change - identified using at most one change (AMOC) change point detection (Chen & Gupta, 2012)

f_{a c c e l} = \frac{1}{N} \sum_{i}^{N} a_{i}^{2}

(2)

Walking, pausing, and standing - the proportion of the interaction the pedestrian spends walking, moving but not walking at a full pace (i.e., pausing), and not moving (i.e., standing)

f_{\frac{\frac{w a l k}{p a u s e}}{s t a n d}} = \frac{1}{N} \sum_{i}^{N} {\begin{matrix} 1, \frac{\frac{w a l k}{p a u s e}}{s t a n d} \\ 0, e l s e \end{matrix}

(3)

Time gap - the transformed time-to-intersection (TTI) to normalize values between 0 and 1

f_{t i m e g a p} = \frac{1}{N} \sum_{i}^{N} e^{- | T T I_{p, i} - T T I_{v, i} |}

(4)

Path distance 0 - the squared sum of driver and pedestrian physical distance to the intersection point. The “distance” is on the time headway scale for pedestrians and is physical distance for vehicles, as vehicles have a much higher deviation in velocity compared to pedestrians.

f_{d i s t} = \frac{1}{N} \sum_{i}^{N} \sqrt{d_{v, i}^{2} + d_{p, i}^{2}}

(5)

Bidirectional glance, which is the proportion of time drivers and pedestrians spent looking in each other’s direction.

f_{b i d i r . g l a n c e} \frac{1}{N} \sum {\begin{matrix} 1, b i d i r . g l a n c e \\ 0, o t h e r w i s e \end{matrix}

(6)

All features were normalized to values between 0 and 1.

Sample-Based Maximum Entropy IRL

The relative weights of these features were estimated using a sample-based maximum entropy IRL algorithm adapted from previous IRL methodologies (Wu et al., 2020). The advantage of this method is that it allows for reward function estimation using only sampled data without a model of driving behavior. The details of the algorithm are shown in Table 1. The output of the algorithm is four sets of weights – sets for the driver and sets for the pedestrian for each environment (i.e., designated and undesignated crossings). These weights represent the relative importance of each feature to the driver and pedestrian interaction given a particular environment. A post-hoc statistical analysis using a Wilcoxon ranked-sum test was performed to assess differences between crossing types as this non-parametric test performed the best for these data that come from a range of distributions.

Table 1.

The Sampling-Based Maximum Entropy Inverse Reinforcement Learning Algorithm Used Adapted from Wu et al. (2020).

Algorithm: Sampling-Based Maximum Entropy IRL for Driving
Output: Optimized reward function weights $θ^{*}$
Input: Driver trajectory demonstrations $D_{M} = {ξ_{i}}_{i = 1 : M}$
1 Compute observed feature count: $\bar{f} (D_{M}) = \frac{1}{M} \sum_{i = 1}^{M} f (ξ_{i})$
2 Generate sample set $D_{s} = {τ_{m}^{i}}_{m = 1 : K_{i}, i = 1 : M}$
3 Initialize $θ ~ N (μ = 0, σ = 0.05)$
4 while step(error)> $10^{- 10}$
5 Compute expected feature count: $\tilde{f} (D_{s}) = \frac{1}{M} \sum_{i = 1}^{M} \frac{1}{K} \sum_{m = 1}^{K_{i}} \frac{e^{R (τ_{m}^{i}, θ_{k})}}{\sum_{m = 1}^{K_{i}} e^{R (τ_{m}^{i}, θ_{k})}} f (τ_{m}^{i})$
6 Error = $‖ \bar{f} (D_{M}) - \tilde{f} (D_{s}) ‖$
7 Update $θ_{k}$ using stochastic gradient decent (Adam optimization)
8 $k = k + 1$
end
$θ^{*} = θ_{k}$

Results

The estimated reward weights for both the driver and pedestrian are shown in Figure 1. Compared to undesignated crossings, designated crossings were characterized by a greater reward weight of the features directly related to kinematics, such as acceleration change and time gap for the driver, and pedestrians had greater weight for pausing behavior.

Figure 1.

Estimated reward weights for drivers (top) and pedestrians (bottom) I designated (gray) and undesignated (red) crossings.

This is supported by the post-hoc analysis, which found that in designated crossings, vehicles were observed to have a greater standard deviation of velocity when the pedestrian was walking or pausing (W = 7,463, p < .01) as seen in Figure 2. Hence, in designated crossings, drivers tended to change vehicle speed more when the pedestrian was moving but not when the pedestrian was standing. For drivers, across both environment types, the acceleration change had the largest relative reward weight, underscoring its important role, while bidirectional gaze was the least weighted communication feature (Figure 1). Comparing communication features across environment types indicated that the vehicle-pedestrian distance feature had a higher weighted importance in undesignated compared to designated crossings.

Figure 2.

The standard deviation (SD) of vehicle velocity given the kinematic state of the pedestrian in designated (gray) and undesignated (red) crossings.

As seen in Figure 3, drivers maintained an increasingly larger distance from the pedestrian in the undesignated compared to designated crossing (initial distance: no statistically significant difference, W = 3,398, p = .56; median distance: trend toward undesignated greater, W = 3,216, p < .1; final distance: undesignated significantly greater, W = 3,046, p < .05).

Figure 3.

The initial (left) median (middle), and final (right) vehicle distance to the pedestrian crossing point in designated (gray) and undesignated (red) crossings.

Finally, the bidirectional gaze feature had a greater value for both the driver and the pedestrian in undesignated crossings. Analysis of gaze behavior indicated that in cases where bidirectional gazing was present it was observed for a greater proportion of the encounter in undesignated crossings (W = 153, p < .05) where it was mainly observed in 35% to 50% of the encounter duration. This is in contrast to designated crossing where bidirectional gazing was less prevalent (in less than 25% of the encounter duration) (Figure 4).

Figure 4.

The proportion of bidirectional gazing in each encounter by designed (gray) and undesignated (red) crossings.

Discussion and Conclusion

This study unveils the nuanced communication underlying driver-pedestrian interactions. By quantifying the relative importance of kinematic and behavioral features in designated and undesignated crossings, the analysis shows clear distinctions in how road users prioritize different aspects of the interaction. In designated crossings, dynamic features such as acceleration changes and time gaps, alongside pedestrian pausing behavior, emerged as important features, which suggests that structured crossings facilitate more predictable and adaptive driver responses. Conversely, undesignated crossings placed a greater emphasis on maintaining larger vehicle-pedestrian distances and extended periods of bidirectional gaze, highlighting a reliance on non-verbal cues in less structured contexts. These findings are consistent with previous work suggesting that higher levels of protection were associated with more interdependent kinematic behaviors (Domeyer et al., 2020; Noonan et al., 2022), more spread-out patterns of shared information between movement cues (Noonan et al., 2023), and greater reliance on bidirectional gaze behavior (Pipkorn et al., 2024).

The use of IRL uncover the implicit priorities directly from observed behavior in real-world scenarios. This approach helps to identify which features (e.g., acceleration changes, gaze, or distance) carry the most influence in shaping driver–pedestrian interactions, even in subtle or context-dependent interactions. Findings from the study enhance our understanding of the underlying utility functions that drive behavior in complex traffic scenarios. Advanced vehicle technologies may support driver-pedestrian communication by utilizing and incorporating channels of communication that are relevant for the different contexts of the environment and interaction.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Toyota Collaborative Safety Research Center. The first author’s efforts in the development of this publication were supported by the New England University Transportation Center (NEUTC; US DOT UTC Grant number 69A3552348301). Data for this study was drawn from work supported by the Advanced Vehicle Technologies (AVT) Consortium at MIT (). The views and conclusions expressed are those of the authors and have not been sponsored, approved, or endorsed by supporting organizations.

References

Chen

Gupta

A. K.

(2012). Parametric Statistical change point analysis: With applications to genetics, medicine, and finance. Birkhäuser. https://doi.org/10.1007/978-0-8176-4801-5

Domeyer

J. E.

Dinparastdjadid

Lee

J. D.

Douglas

Alsaid

Price

(2019). Proxemics and kinesics in automated vehicle–pedestrian communication: representing ethnographic observations. Transportation Research Record, 2673(10), 70–81. https://doi.org/10.1177/0361198119848413

Domeyer

J. E.

Lee

J. D.

Toyoda

Mehler

Reimer

(2020). Interdependence in vehicle-pedestrian encounters and its implications for vehicle automation. IEEE Transactions on Intelligent Transportation Systems, 22(12), 7252–7264. https://doi.org/10.1109/TITS.2020.3041562

Fridman

Brown

D. E.

Glazer

Angell

Dodd

Jenik

Terwilliger

Patsekin

Kindelsberger

Ding

Seaman

Mehler

Sipperley

Pettinato

Seppelt

B. D.

Angell

Mehler

Reimer

(2019). MIT advanced vehicle technology study: Large-scale naturalistic driving study of driver behavior and interaction with automation. IEEE Access, 7, 102021–102038. https://doi.org/10.1109/ACCESS.2019.2926040

Lee

Y. M.

Madigan

Giles

Garach-Morcillo

Markkula

Fox

Camara

Rothmueller

Vendelbo-Larsen

S. A.

Rasmussen

P. H.

Dietrich

Nathanael

Portouli

Schieben

Merat

(2021). Road users rarely use explicit communication when interacting in today’s traffic: Implications for automated vehicles. Cognition, Technology & Work, 23(2), 367–380. https://doi.org/10.1007/s10111-020-00635-y

A. Y.

Russell

(2000). Algorithms for inverse reinforcement learning. In Proceedings of the 17th international conference on machine learning (ICML 2000) (pp. 663–670). Morgan Kaufmann.

Noonan

T. Z.

Gershon

Domeyer

Mehler

Reimer

(2022). Interdependence of driver and pedestrian behavior in naturalistic roadway negotiations. Traffic Injury Prevention, 23(Suppl 1), S62–S67. https://doi.org/10.1080/15389588.2022.2108023

Noonan

T. Z.

Gershon

Domeyer

Mehler

Reimer

(2023, October 3). Kinematic cues in driver-pedestrian communication to support safe road crossing. In Proceedings of the 67th annual scientific conference of the association for the advancement of automotive medicine (AAAM), Indianapolis, IN.

Pipkorn

Domeyer

Mehler

Reimer

Gershon

(2025). Decoding the silent dialogue: Unveiling driver-pedestrian communication dynamics with a hidden Markov model. Transportation Research Part F: Traffic Psychology and Behaviour, 109, 965–976. https://doi.org/10.1016/j.trf.2025.01.011

10.

Pipkorn

Noonan

T. Z.

Domeyer

Mehler

Reimer

Gershon

(2024). Naturalistic analysis of bidirectional gazing during vehicle-pedestrian encounters at road crossings. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 10711813241276467. https://doi.org/10.1177/10711813241276467

11.

Russell

(1998). Learning agents for uncertain environments. In Proceedings of the eleventh annual conference on computational learning theory (COLT ’98) (pp. 101–103). ACM.

12.

von Neumann

Morgenstern

Rubinstein

(1944). Theory of games and economic behavior (60th anniversary commemorative ed.). Princeton University Press.

13.

Sun

Zhan

Yang

Tomizuka

(2020). Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving (No. arXiv:2006.13704). arXiv. https://doi.org/10.48550/arXiv.2006.13704

14.

Ziebart

B. D.

Maas

Bagnell

J. A.

Dey

A. K.

(2008). Maximum entropy inverse reinforcement learning. In Proceedings of the twenty-third AAAI conference on artificial intelligence (AAAI-08) (pp. 1433–1438). AAAI Press.

Quantifying the Role of Kinematic and Behavioral Features in Driver-Pedestrian Interaction across Environments: An Inverse Reinforcement Learning Approach

Abstract

Keywords

Introduction

Methods

Data Collection

Feature Engineering

Sample-Based Maximum Entropy IRL

Results

Discussion and Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

References