Abstract
Distracted driving is the primary cause of traffic crashes, and research on driver behavior recognition (DBR) has the potential to reduce the number of crashes caused by those distractions. However, existing DBR networks are trained and validated on a single data set, which leads to excessive overfitting to specific scenarios and limits their generalization to other scenarios. In addition, these networks are limited to considering contextual geometric features for recognition, failing to emphasize the importance of the driver’s skeletal spatial features, which restricts their ability to capture the crucial visual distinctions among various driver behaviors. Therefore, to address the previous issues, a deep attention network with geometric–spatial fusion features (DAN–GSFF) is proposed. Specifically, the DAN–GSFF integrates the image’s global and driver’s pose information as dual input, simultaneously considering the contextual geometric and skeletal spatial features for recognition. By embedding a multidimensional collaborative visual attention module in DBR (MCA–DBR), DAN–GSFF is guided to selectively focus on the driver’s local detailed features. Furthermore, a large-scale and diverse data set (SAA13) is integrated, which encompasses a wide variety of scenarios and includes data from 272 drivers, providing a more realistic representation of real-world scenarios. Experimental results demonstrate that DAN–GSFF outperforms other state-of-the-art models, achieving 90.11% accuracy and 96.2 frames per second on the SAA13 comprehensive data set. These results, with additional real time verifications on video streams, indicate that the DAN–GSFF exhibits strong recognition performance and robust generalization ability in complex driving scenarios.
Keywords
Get full access to this article
View all access options for this article.
