Abstract
To address the low accuracy of pedestrian detection in vehicle vision under scenarios such as low resolution, varying illumination, partial occlusion, and dynamic backgrounds, this paper proposes a human skeleton pose and pedestrian detection model based on spatiotemporal gait features. Combined with real-time pedestrian-vehicle relative distance prediction for accident prevention and warning, the model enhances active safety warning capabilities. First, building upon the mainstream skeleton pose estimation method—the spatiotemporal graph convolutional network—this paper introduces a Spatiotemporal-Enhanced Graph Convolutional Network (STE-GCN) by augmenting the spatiotemporal modules. By stacking heatmaps along the temporal dimension to form a 3D heatmap volume, spatiotemporal features of joint points are preserved. A uniform sampling strategy is adopted to optimize redundancy in the 3D heatmap volume. Based on the ResNet-50 network, a gait joint point feature extraction backbone is constructed, leading to the proposed PoseConv3D pedestrian detection model. Experimental comparison results show that PoseConv3D outperforms the STE-GCN, Gait Graph, and MS-G3D models on the test set, achieving a Top-1 accuracy of 97.67% and a Mean Class Accuracy of 96.67%. To enable accident prevention and warning, this paper constructs a depth estimation model based on a Swin-Transformer encoder and a multi-scale fusion decoder. By incorporating multi-scale fusion modules, residual optimization modules, iterative adaptive bins, and an iterative optimizer, high-accuracy real-time human-vehicle relative distance prediction is achieved. Ablation experiments confirm that the combined model reduces the mean absolute relative error (AbsRel) to 0.050, significantly enhancing the robustness of environmental perception. Finally, real-world vehicle experiments demonstrate that the system can effectively identify behavioral patterns such as walking, running, and standing still under sufficient illumination, low light, occlusion, and dynamic backgrounds. By establishing a hierarchical warning mechanism based on safe braking distance, the system provides reliable technical support for decision-making and control in autonomous vehicles.
Get full access to this article
View all access options for this article.
