Abstract
Accurate detection of 3D obstacles is crucial in autonomous vehicles and intelligent traffic systems. The multi-modal fusion of cameras and LiDAR in 3D object detection can fully leverage the advantages of both sensors, to improve the accuracy and robustness of target detection and make it become a core component of the perception system in autonomous vehicles. However, due to the inherent differences in sensor data, the difficulty of data fusion in 3D object detection still faces numerous challenges. To effectively address this issue, a 3D object detection algorithm based on multi-scale feature-weighted point-by-point fusion is proposed. By establishing correspondences between camera images and lidar point clouds on a point-wise basis, we employ the ResNet50 network model to obtain multi-scale semantic features from images. The importance of different channels in image features is reasonably assigned weights, enhancing point features with image semantic information. This approach proves beneficial in tackling the challenge of matching image and point cloud fusion, which is hindered by disparate data structures. It fully leverages the complementary nature of multi-modal information. Experimental results on the KITTI object detection benchmark dataset show that the proposed 3D object detection algorithm achieves an average detection accuracy of 80.95%, a 1.34% improvement compared to previous multi-modal algorithms, demonstrating superior 3D object detection performance.
Get full access to this article
View all access options for this article.
