Sage Journals: Discover world-class research

Abstract

3D object detection acts as a vital part to autonomous, yet existing methods from point cloud face a fundamental trade-off between detection accuracy and computational efficiency. Current detectors with the fastest inference speed usually rely on pillar encoding, but they tend to lag behind their popular counterparts in detection accuracy. In this paper, we propose AA-Pillars, a high-performance pillar-based detector comprising two key modules: the Agent Attention based Pillar Feature Net (AA-PFN) and the Next Depthwise Separable Convolution Network (NDC-Net). AA-PFN leverages the agent attention mechanism to efficiently extract refined pillar features and minimize information loss. NDC-Net enhances feature extraction from pseudo-image by introducing depthwise separable convolution. Extensive experiments on the KITTI and nuScenes dataset demonstrate the superiority of our method, outperforming classic point cloud based baseline method. When integrated into multimodal frameworks, AA-Pillars enhancing model performance without compromising inference speed. These results underscore our method’s ability to balance accuracy and efficiency while offering seamless compatibility with existing multimodal fusion pipelines, providing a universal solution for multi-sensor perception systems.

Keywords

3D object detection point cloud agent attention autonomous driving depthwise separable convolution

Get full access to this article

View all access options for this article.

References

, et al. Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2017, 2017, pp.652–660.

, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems 2017, p. 30.

Shi

Wang

Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, 2019, pp.770–779.

Zhou

Tuzel

. Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2018, 2018, pp. 4490–4499.

Yan

Mao

. Second: sparsely embedded convolutional detection. Sensors 2018; 18: 3337.

Lang

Vora

Caesar

, et al. Pointpillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2019, 2019, pp. 12697–12705.

Liu

Zhao

Huang

, et al. Tanet: robust 3d object detection from point clouds with triple attention. In: Proceedings of the AAAI conference on artificial intelligence 2020, 2020, pp. 11677–11684.

Guo

Yang

Wang

. PillarNet++: pillar-based 3-D object detection with multiattention. IEEE Sens J 2023; 23: 27733–27743.

Shi

Pillarnet: real-time and high-performance pillar-based 3d object detection. In: European conference on computer vision 2022, 2022, pp. 35–52. Springer.

10.

Luo

Yang

. PillarNeXt: rethinking network designs for 3D object detection in LiDAR point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2023, 2023, pp. 17567–17576.

11.

Chen

L-C

Papandreou

Kokkinos

, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 2017; 40: 834–848.

12.

Hao

Deng

Liu

, et al. FCOS3Dformer: enhancing monocular 3D object detection through transformer-assisted fusion of depth information. Int J Veh Syst Model Test 2024; 18: 228–244.

13.

Chen

Wan

, et al. Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition 2017, 2017, pp. 1907–1915.

14.

Mozifian

Lee

, et al. Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), 2018, pp. 1–8. IEEE.

15.

Yoo

Kim

, et al. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In: European conference on computer vision 2020, 2020, pp. 720–736. Springer.

16.

Liang

Xie

, et al. BEVFusion: a simple and robust lidar-camera fusion framework. Adv Neural Inf Process Syst 2022; 35: 10421–10434.

17.

Liu

Tang

Amini

, et al. BEVFusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv preprint arXiv:220513542, 2022.

18.

Han

, et al. Agent attention: On the integration of softmax and linear attention. In: European conference on computer vision 2024, pp. 124–140. Springer.

19.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. Adv Neural Inf Process Syst 2017; 30: 12–23.

20.

Carion

Massa

Synnaeve

, et al. End-to-end object detection with transformers. In: European conference on computer vision 2020, pp. 213–229. Springer.

21.

Dosovitskiy

. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:201011929, 2020.

22.

Yuan

Wen

, et al. EfficientFormer: vision transformers at MobileNet speed. Adv Neural Inf Process Syst 2022; 35: 12934–12949.

23.

Xia

, et al. Next-Vit: next generation vision transformer for efficient deployment in realistic industrial scenarios. arXiv preprint arXiv:220705501, 2022.

24.

Hatamizadeh

Heinrich

Yin

, et al. FasterViT: fast vision transformers with hierarchical attention. arXiv preprint arXiv:230606189, 2023.

25.

Guo

M-H

Cai

J-X

Liu

Z-N

, et al. PCT: point cloud transformer. Comput Vis Media 2021; 7: 187–199.

26.

Zhao

Jiang

Jia

, et al. Point transformer. In: Proceedings of the IEEE/CVF international conference on computer vision 2021, 2021, pp. 16259–16268.

27.

Pan

Xia

Song

, et al. 3d object detection with pointformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2021, 2021, pp. 7463–7472.

28.

Mao

Xue

Niu

, et al. Voxel transformer for 3d object detection. In: Proceedings of the IEEE/CVF international conference on computer vision 2021, 2021, pp. 3164–3173.

29.

, et al. Voxel set transformer: A set-to-set approach to 3d object detection from point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2022, 2022, pp. 8417–8427.

30.

Lai

Chen

, et al. Spherical transformer for lidar-based 3d recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2023, 2023, pp. 17545–17555.

31.

Fan

Pang

Zhang

, et al. Embracing single stride 3d object detector with sparse transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2022, 2022, pp. 8458–8468.

32.

Luo

Zhou

, et al. Metaformer is actually what you need for vision. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2022, 2022, pp. 10819–10829.

33.

Liu

Anguelov

Erhan

, et al. Ssd: Single shot multibox detector. In: European conference on computer vision 2016, 2016, pp. 21–37. Springer.

34.

Geiger

Lenz

Stiller

, et al. Vision meets robotics: the KITTI dataset. Int J Robot Res 2013; 32: 1231–1237.

35.

Caesar

Bankiti

Lang

, et al. nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2020, pp. 11621–11631.

36.

Paigwar

Sierra-Gonzalez

Erkent

, et al. Frustum-pointpillars: A multi-stage approach for 3d object detection using rgb camera and lidar. In: Proceedings of the IEEE/CVF international conference on computer vision 2021, 2021, pp. 2926–2933.

37.

Huang

Zhu

, et al. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:211211790, 2021.

38.

Huang

Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:220317054, 2022.

39.

, et al. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In: Proceedings of the AAAI conference on artificial intelligence 2023, 2023, pp. 1477–1485.

40.

Wang

, et al. Fb-bev: Bev representation from forward-backward view transformations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision 2023, 2023, pp. 6919–6928.

41.

Yin

Zhou

Krahenbuhl

Center-based 3d object detection and tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2021, 2021, pp. 11784–11793.

42.

Chen

Liu

Zhang

, et al. Voxelnext: Fully sparse voxelnet for 3d object detection and tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2023, 2023, pp. 21674–21683.

43.

Wang

Y-L

Chen

, et al. Uni3DETR: unified 3D detection transformer. Adv Neural Inf Process Syst 2023; 36: 39876–39896.

44.

Bai

Zhu

, et al. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2022, 2022, pp.1090–1099.

45.

Kim

Shin

Kim

, et al. Crn: Camera radar net for accurate, robust, efficient 3d perception. In: Proceedings of the IEEE/CVF International Conference on Computer Vision 2023, 2023, pp. 17615–17626.

46.

Lin

Liu

Xia

, et al. Rcbevdet: Radar-camera fusion in bird's eye view for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024, 2024, pp. 14928–14937.

47.

Zhang

Ren

, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2016, 2016, pp. 770–778.

48.

Bochkovskiy

Wang

C-Y

Liao

H-YM

. YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:200410934, 2020.

49.

Xie

Girshick

Dollár

, et al. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2017, 2017, pp. 1492–1500.

AA-Pillars: Pillar-Based 3D object detection with refined pillar features

Abstract

Keywords

Get full access to this article

References