Abstract
Accurate three-dimensional (3D) vehicle detection in roadside light detection and ranging (LiDAR) point clouds is critical for intelligent transportation systems, as it enhances traffic efficiency, strengthens safety management, and supports vehicle–road-cloud collaboration. This paper addresses key challenges in this context: uneven point cloud density, limitations of single-modality representations (voxel-based feature diffusion, pillar-based vertical information loss), and inefficient long-range dependency modeling. We propose VPMambaNet, a novel model integrating three core innovations: (1) a voxel–pillar hybrid representation with dual-path architecture, leveraging voxels’ vertical detail preservation and pillars’ efficient coverage of sparse regions; (2) the hybrid scan state space module, a cascaded state-space module with Hilbert and cross scans for hierarchical local-to-global modeling with linear complexity; (3) the neighborhood attention extension-based voxel–pillar fusion module, enabling progressive cross-modal integration. Experiments on DAIR-V2X-I show VPMambaNet outperforms state-of-the-art methods by 1.11%–2.21% in average precision across difficulty levels, with stronger gains in complex scenarios. Ablation and qualitative analyses validate its robustness to sparse point clouds, long-range targets, and annotation noise. VPMambaNet provides an efficient, accurate solution for roadside 3D vehicle detection, directly supporting practical transportation applications such as real-time traffic monitoring and autonomous driving collaboration.
Keywords
Get full access to this article
View all access options for this article.
