Abstract
This study predicts drivers’ situational awareness (SA) of specific traffic elements during takeover transitions in Level 3 automated vehicles. Using a simulator dataset from 44 participants, we analyzed multimodal features, including driver characteristics, physiological data, eye movement, and environmental attributes. Various machine learning models (e.g., SVM, Logistic Regression, and XGBoost) were optimized through feature selection and time window tuning. The SVM model, using a 3-second post-TOR and 1-second pre-TOR window, achieved the best performance with a macro F1 score of 0.75 and 0.77 accuracy. Our approach highlights the importance of comprehensive feature sets and timely predictions for improving driver support systems. This model aids in identifying potential hazards and enhances takeover readiness, contributing to safer autonomous driving transitions.
Objectives
The Level 3 conditionally automated vehicles allow drivers to engage in non-driving-related tasks in automated driving but require them to take over the vehicle if the situation demands (Society of Automotive Engineers, 2018). During takeover transitions, drivers who are out of the vehicle control loop for a while might lack situational awareness (SA) of driving environments, potentially resulting in inappropriate takeover responses. Hence, monitoring a driver’s SA and providing adaptive support information is critical to enhancing the driving safety and efficiency of the takeover process.
Researchers have initiated efforts to develop machine learning models that utilized multimodal datasets to predict the driver’s SA during the driving and takeover process (Yang et al., 2023; Zhou et al., 2021; Zhu et al., 2021). However, there are several main gaps in previous studies on SA prediction in takeover scenarios. First, current research primarily focused on predicting drivers’ overall SA of the entire scene (e.g., Yang et al., 2023; Zhou et al., 2021), but did not specify SA of individual environmental elements. Second, the feature datasets in previous studies often lack comprehensiveness. For instance, physiological data such as HR and GSR, which have been proven to be related to driver’s SA (Liang et al., 2021), were not incorporated in previous studies (Yang et al., 2023; Zhou et al., 2021). Third, current studies only focused on SA of traffic elements that were visible on the windshield but ignored the objects behind or side behind, which were also important to takeover responses.
To address the research gap, the present study aimed to predict drivers’ SA of traffic elements during takeover process in conditionally automated vehicles. The modeling process used a dataset collected in a driving simulator experiment that encompassed a range of takeover events. The predictive features were extracted in accordance with our predefined feature framework, including multiple categories of information such as driver’s characteristics, physiological state, eye movement behavior, and environmental attributes. The SA labels for specific objects were obtained from drivers’ scene reconstruction tasks following the takeover process. Our modeling methodology incorporated feature selection, grid search, hyperparameter tuning, time window optimization, and the utilization of various statistical machine learning models, including Logistic Regression (LR), Random Forest (RF), Linear Discriminant Analysis (LDA), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), and Neural Network (NN), to identify the model with the best performance.
Approach
The dataset, including 264 takeover event data from 44 participants (average age = 24.0, SD = 3.2; 14 females and 30 males), was collected through a simulator experiment, where participants were asked to play Tetris in automated driving and take over the vehicle when a TOR was issued. Each participant experienced six challenging takeover scenarios with varying traffic densities and surrounding vehicles randomly appearing at different locations relative to the ego vehicle (e.g., side front, side behind, and behind).
The model features included characteristics (e.g., license year), environment attributes (e.g., traffic density), eye movement behavior (e.g., gaze behavior), as well as physiological state data (e.g., electrocardiograms) before and after the takeover request (TOR). The ground truth was labeled by asking drivers to do a scene reconstruction task post-takeover and calculating the object localization accuracy.
To construct our model, firstly, the initial set of 58 features was subjected to feature selection based on importance and inter-correlation, resulting in a final list of 28 features. Secondly, using grid search, we iteratively explored various combinations of time windows (ranging from 0 to 10 seconds for both pre-TOR and post-TOR time windows), models (LR, LDA, XGBoost, SVM, and NN), and relevant typical model hyperparameters. We employed cross-validation and resampling techniques to train the models and evaluated their performance according to the macro F1 score, considering the dataset imbalance and ensuring balanced recognition of positive and negative instances.
Findings
The SVM model, utilizing a 3s post-TOR time window and a 1s pre-TOR time window, delivered the best performance. It not only achieved a high macro F1 score of 0.75 and a high accuracy of 0.77 but also maintained balanced recall scores of 0.77 for both the positive and negative classes with the short time window. This facilitates accurate predictions of drivers’ SA in a short time and provides corresponding assistance information (e.g., warning of objects unknown to the driver), thereby enhancing the efficiency and safety of autonomous driving takeovers.
Upon finalizing the model, hyperparameters, and time window, we conducted additional feature exploration to both improve the model performance and further understand the feature significance. Firstly, we grouped the features according to their category and the time window they belong to. We then assessed the model’s performance for each specific category of features and within each time window. Models with the feature list from a single feature category or a single time window showed significantly inferior performance compared to models utilizing all features. This underscores the comprehensiveness and rationality of our time window and feature selection. Secondly, we added features one by one based on their importance. The model performance improved significantly at the beginning and stabilized with little increase once the top 9 important features were included. Therefore, in practical applications, the appropriate reduction of features could be considered to reach a balance between the model’s predictive performance and computational cost.
Takeaways
We constructed an SVM model for predicting a driver’s SA of specific objects within diverse traffic scenarios. Compared to previous studies for SA prediction, our model offers object-specific predictions with improved timeliness, enhancing its applicability in driver support systems. Moreover, we simulated diverse traffic conditions with an emphasis on modeling the driver’s SA of objects in the rear-view and side-view mirrors, enhancing the model’s generalizability across different situations. In addition, we extracted a systematic and comprehensive feature list based on our feature framework. These features were derived from both the pre-TOR and post-TOR time windows and encompassed four categories including the driver’s characteristics, physiological state, eye movement behavior, and environmental attributes.
The modeling approach and models proposed in this study will contribute to the design and implementation of advanced driver monitoring and support systems. During the takeover process, our model can help identify the potential hazards that drivers may not be aware of and provide a comprehensive assessment of the driver’s takeover readiness, serving as a detailed reference for the driver support messages.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We acknowledge the support from the Pitt Momentum Funds.
