Abstract
Roadside camera-driven 3D object detection is crucial for intelligent transportation systems, as it extends perception beyond the limitations of vehicle-mounted sensors and enhances road safety. While vehicle-mounted sensors often suffer from obstructed fields of view and limited long-range perception, roadside cameras offer elevated mounting positions, broader fields of view, and cost advantages, making them a compelling alternative for robust perception. This paper presents Monocular Probabilistic Height and Depth (MonoPHD), a novel framework that leverages the complementary depth and height information, as well as the geometry relationship between each instance to improve depth estimation. To further enhance feature representation, MonoPHD adopted a 3D-weights attention module. This enables more accurate and robust 3D detection by focusing on the most relevant regions and features. Experiments on the KITTI and Rope3D datasets demonstrate that MonoPHD significantly outperformed baseline method in average precision (AP). This study highlights the potential of monocular 3D object detection using roadside cameras, paving the way for safer, more efficient intelligent transportation systems with enhanced perception capabilities.
Keywords
Get full access to this article
View all access options for this article.
