VPMambaNet: Breaking Limits in Roadside Three-Dimensional Vehicle Detection via Hybrid Voxel–Pillar Modeling and Mixed-Scan State Space Model for Intelligent Traffic Management

Abstract

Accurate three-dimensional (3D) vehicle detection in roadside light detection and ranging (LiDAR) point clouds is critical for intelligent transportation systems, as it enhances traffic efficiency, strengthens safety management, and supports vehicle–road-cloud collaboration. This paper addresses key challenges in this context: uneven point cloud density, limitations of single-modality representations (voxel-based feature diffusion, pillar-based vertical information loss), and inefficient long-range dependency modeling. We propose VPMambaNet, a novel model integrating three core innovations: (1) a voxel–pillar hybrid representation with dual-path architecture, leveraging voxels’ vertical detail preservation and pillars’ efficient coverage of sparse regions; (2) the hybrid scan state space module, a cascaded state-space module with Hilbert and cross scans for hierarchical local-to-global modeling with linear complexity; (3) the neighborhood attention extension-based voxel–pillar fusion module, enabling progressive cross-modal integration. Experiments on DAIR-V2X-I show VPMambaNet outperforms state-of-the-art methods by 1.11%–2.21% in average precision across difficulty levels, with stronger gains in complex scenarios. Ablation and qualitative analyses validate its robustness to sparse point clouds, long-range targets, and annotation noise. VPMambaNet provides an efficient, accurate solution for roadside 3D vehicle detection, directly supporting practical transportation applications such as real-time traffic monitoring and autonomous driving collaboration.

Keywords

roadside perception LiDAR point cloud 3D vehicle detection Mamba intelligent transportation system

Get full access to this article

View all access options for this article.

References

Wang

Zhang

Wang

Song

Zhang

Zhu

Liu

SAT-GCN: Self-Attention Graph Convolutional Network-Based 3D Object Detection for Autonomous Driving. Knowledge-Based Systems, Vol. 15, 2023, p. 259.

Sun

Kretzschmar

Dotiwalla

Chourad

Patnaik

Tusi

Guo

, et al. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 2020, pp. 2446–2454.

Caesar

Bankiti

Lang

A. H.

Vora

Liong

V. E.

Krishnan

Pan

Baldan

Beijbom

nuScenes: A Multimodal Dataset for Autonomous Driving. Proc., IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, 2020, pp. 11618–11628.

Geiger

Philip

Raquel

Are We Ready for Autonomous Driving? The Kitti Vision Benchmark Suite. Proc., IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Providence, RI, 2012, pp. 3354–3361.

Zhang

MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences. Proc., IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 5196–5205.

Yang

Zhou

Shi

Liu

Yang

, et al. DetZero: Rethinking Offboard 3D Object Detection with Long-Term Sequential Point Clouds. Proc., IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023, pp. 6713–6724.

Wang

Shi

Lei

Wang

Schiele

Wang

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets. Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 13520–13529.

Luo

Shu

Huo

Yang

Shi

Guo

, et al. Dair-v2x: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection. Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, 2022, pp. 21361–21370.

Zhang

Sun

Yue

Wen

Chen

Wang

Heightformer: Learning Height Prediction in Voxel Features for Roadside Vision Centric 3D Object Detection via Transformer. IEEE Transaction on Intelligent Transportation Systems, Vol. 27, No. 4, 2026, pp. 4842–4850.

10.

Wang

Zheng

Zhan

Tan

Wang

Bevspread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision Based Roadside 3D Object Detection. Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 2024, pp. 14718–14727.

11.

Chen

Liu

Zhang

Jia

Voxelnext: Fully Sparse Voxelnet for 3D Object Detection and Tracking. Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 2023, pp. 21674–21683.

12.

Deng

Shi

Zhou

Zhang

Voxel R-CNN: Towards High Performance Voxel-Based 3D Object Detection. AAAI Conference on Artificial Intelligence, Vol. 2, 2022, pp. 1201–1209.

13.

Fan

Yang

Wang

Zhang

Super Sparse 3D Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 2023, pp. 12490–12505.

14.

Shi

Pillarnet: Real-Time and High-Performance Pillar-Based 3D Object Detection. Proc., European Conference on Computer Vision, Tel Aviv, Israel, 2022, pp. 35–52.

15.

Guo

Yang

Wang

PillarNet++: Pillar-Based 3-D Object Detection with Multiattention. IEEE Sensors Journal, Vol. 23, 2023, pp. 27733–27743.

16.

Yin

Zhou

Krahenbuhl

Center-Based 3D Object Detection and Tracking. Proc., IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Nashville, TN, 2021, pp. 11779–11788.

17.

Luo

Yang

Pillarnext: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds. Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 17567–17576.

18.

Fan

Pang

Zhang

Wang

Zhao

Wang

Embracing Single Stride 3D Object Detector with Sparse Transformer. Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, 2022, pp. 8458–8468.

19.

Shi

Wang

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud. Proc., IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 2019, pp. 770–779.

20.

Shi

Guo

Jiang

Wang

Shi

Wang

PV-RCNN: Point–Voxel Feature Set Abstraction for 3D Object Detection. Proc., IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, 2020, pp. 10526–10535.

21.

Zhang

Sun

Yue

Wen

Wang

Leng

PillarMamba: Learning Local-Global Context for Roadside Point Cloud via Hybrid State Space Model. arXiv: 2505.05397v1, 2025.

22.

Dao

Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv: 2312.00752v2, 2023.

23.

Hilbert

Über die stetige abbildung einer linie auf ein flächenstück. Dritter Band: Analysis (Grundlagen der Mathematik·Physik Verschiedenes: Nebst Einer Lebens geschichte), Springer, Berlin, Heidelberg, pp. 1–2, 1935. https://doi.org/10.1007/978-3-662-38452-7_1

24.

Lang

A. H.

Vora

Caesar

Zhou

Yang

Beijbom

PointPillars: Fast Encoders for Object Detection from Point Clouds. Proc., IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 2019, pp. 12689–12697.

25.

Smith

L. N.

Topin

Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates. Proc., Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, Vol. 11006. SPIE, 2019, pp. 369–386.

26.

Geiger

Lenz

Urtasun

Are We Ready for Autonomous Driving? The Kitti Vision Benchmark Suite. Proc., 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, 2012, pp. 3354-3361, doi: 10.1109/CVPR.2012.6248074.