Abstract
3D object detection acts as a vital part to autonomous, yet existing methods from point cloud face a fundamental trade-off between detection accuracy and computational efficiency. Current detectors with the fastest inference speed usually rely on pillar encoding, but they tend to lag behind their popular counterparts in detection accuracy. In this paper, we propose AA-Pillars, a high-performance pillar-based detector comprising two key modules: the Agent Attention based Pillar Feature Net (AA-PFN) and the Next Depthwise Separable Convolution Network (NDC-Net). AA-PFN leverages the agent attention mechanism to efficiently extract refined pillar features and minimize information loss. NDC-Net enhances feature extraction from pseudo-image by introducing depthwise separable convolution. Extensive experiments on the KITTI and nuScenes dataset demonstrate the superiority of our method, outperforming classic point cloud based baseline method. When integrated into multimodal frameworks, AA-Pillars enhancing model performance without compromising inference speed. These results underscore our method’s ability to balance accuracy and efficiency while offering seamless compatibility with existing multimodal fusion pipelines, providing a universal solution for multi-sensor perception systems.
Keywords
Get full access to this article
View all access options for this article.
