MonoPHD: Monocular probabilistic height and depth estimation for 3D object detection in roadside perception

Abstract

Roadside camera-driven 3D object detection is crucial for intelligent transportation systems, as it extends perception beyond the limitations of vehicle-mounted sensors and enhances road safety. While vehicle-mounted sensors often suffer from obstructed fields of view and limited long-range perception, roadside cameras offer elevated mounting positions, broader fields of view, and cost advantages, making them a compelling alternative for robust perception. This paper presents Monocular Probabilistic Height and Depth (MonoPHD), a novel framework that leverages the complementary depth and height information, as well as the geometry relationship between each instance to improve depth estimation. To further enhance feature representation, MonoPHD adopted a 3D-weights attention module. This enables more accurate and robust 3D detection by focusing on the most relevant regions and features. Experiments on the KITTI and Rope3D datasets demonstrate that MonoPHD significantly outperformed baseline method in average precision (AP). This study highlights the potential of monocular 3D object detection using roadside cameras, paving the way for safer, more efficient intelligent transportation systems with enhanced perception capabilities.

Keywords

3D object detection autonomous driving deep learning roadside perception system purely visual algorithm research

Get full access to this article

View all access options for this article.

References

Fan

Wang

Huo

, et al. Calibration-free BEV representation for infrastructure perception. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems. (IROS), Detroit, Michigan, USA. 1–5 October 2023, pp.9008–9013. IEEE.

Shi

Pang

Zhang

, et al. CoBEV: elevating roadside 3D object detection with depth and height complementarity. IEEE Trans Image Process 2024; 33: 5424–5439.

Yang

Tang

, et al. BEVHeight: a robust framework for vision-based roadside 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023, pp.21611–21620.

Yang

. Multifeature fusion-based object detection for intelligent transportation systems. IEEE Trans Intell Transp Syst 2023; 24(1): 1126–1133.

, et al. V2X-Sim: multi-agent collaborative perception dataset and benchmark for autonomous driving. IEEE Robot Autom Lett 2022; 7: 10914–10921.

Luo

Shu

, et al. DAIR-V2X: a large-scale dataset for vehicle-infrastructure cooperative 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022, pp.21329–21338. IEEE.

Shu

, et al. Rope3D: the roadside perception dataset for autonomous driving and monocular 3D object detection task. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022, pp.21309–21318.

Yang

Ruan

, et al. V2X-Seq: a large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023, pp.5486–5495. IEEE.

Jia

Shi

. MonoUNI: a unified vehicle and infrastructureside monocular 3D object detection network with sufficient depth clues. In: Advances in neural information processing systems, New Orleans, LA, 2023, vol. 36, pp.11703–11715.

10.

Zhang

, et al. YOLOv7-3D: a monocular 3D traffic object detection method from a roadside perspective. Appl Sci 2023; 13(20): 11402.

11.

Wang

Zhu

Pang

, et al. FCOS3D: fully convolutional one-stage monocular 3D object detection. In: Proceedings of the IEEE/CVF international conference on computer vision workshops (ICCVW), Montreal, QC, Canada, 11–17 October 2021, pp.913–922. IEEE.

12.

Geiger

Lenz

Urtasun

. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Providence, RI, USA, 16–21 June 2012, pp.3354–3361. IEEE.

13.

Qin

Wang

. MonoGRNet: a general framework for monocular 3D object detection. IEEE Trans Pattern Anal Mach Intell 2021; 44: 5170–5184.

14.

Zhou

Zhu

, et al. MonoEF: extrinsic parameter free monocular 3D object detection. IEEE Trans Pattern Anal Mach Intell 2021; 44: 10114–10128.

15.

Cress

Zimmer

Strand

, et al. A9-Dataset: multi-sensor infrastructure-based dataset for mobility research. In: IEEE intelligent vehicles symposium. (IV), Aachen, Germany, 04–09 June 2022, pp.965–970. IEEE.

16.

Zimmer

Birkner

Brucker

, et al. InfraDet3D: multi-modal 3D Object detection based on roadside infrastructure camera and LiDAR sensors. In: IEEE intelligent vehicles symposium (IV), Anchorage, AK, USA, 4–7 June 2023, pp.1–8.

17.

Wang

Chao

Garg

, et al. Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving.” In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, 15–20 June 2019, pp.8437–8445. IEEE.

18.

Carrillo

Waslander

. “UrbanNet: leveraging urban maps for long range 3D object detection. In: Proceedings. IEEE international intelligent transportation systems conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021, pp.3799–3806. IEEE.

19.

Shi

, et al. Monocular 3D object detection via feature domain adaptation. In: Proceedings of the European conference on computer vision (ECCV), Glasgow, UK, 23–28 August 2020, vol. 12354, pp.17–34.

20.

Zhang

, et al. Delving into localization errors for monocular 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. pp.4719–4728. IEEE.

21.

Zhang

Zhou

. Objects are different: flexible monocular 3D object detection. In: Proceedings IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, TN, USA, 20–25 June 2021, pp.3288–3297.

22.

Yang

Zhang

, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks. In: Proceedings of the 38th international conference on machine learning (ICML), 18–24 July 2021, vol. 139, pp.11863–11874. PMLR.

23.

Wang

Zhu

Pang

, et al. Probabilistic and geometric depth: detecting objects in perspective. In: Proceedings of the 5th conference on robot learning (CoRL), 8–11 November 2021, pp.1475–1485.

24.

Brazil

Pons-Moll

Liu

, et al. Kinematic 3D object detection in monocular video. In: Proceedings of the European conference on computer vision (ECCV), Glasgow, UK, 23 August 2020. pp.135–152.

25.

Liu

T’oth

. SMOKE: single-stage monocular 3D object detection via keypoint estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020, pp.4289–4298. IEEE.

26.

Zhu

, et al. Monocular 3D object detection with motion feature distillation. IEEE Access, 2023; 11: 82933–82945.

27.

Brazil

Liu

. M3D-RPN: monocular 3D region proposal network for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), Seoul, Korea (South), 27 October–02 November 2019, pp.9286–9295. IEEE.

28.

Zhang

. Delving into localization errors for monocular 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, TN, USA, 20–25 June 2021, pp.4719–4728. IEEE.

29.

Zhang

Zhou

. Objects are different: flexible monocular 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, TN, USA, 20–25 June 2021, pp.3288–3297. IEEE.

30.

M. Contributors. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection, 2020. https://github.com/open-mmlab/mmdetection3d.

31.

Zhang

Qiu

Wang

, et al. MonoDETR: depth-guided transformer for monocular 3D object detection. In: 2023 IEEE/CVF international conference on computer vision (ICCV), Paris, France, 01–06 October 2023, pp.9121–9132. IEEE

32.

Yang

Tang

, et al. BEVHeight: a robust framework for vision-based roadside 3D object detection. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023, pp.21611–21620. IEEE.

33.

Yan

Xiong

, et al. MonoCD: monocular 3D object detection with complementary depths. ArXiv, Epub ahead of print 19 September 2024. DOI: 10.48550/arXiv.2403.11181.

34.

Zhang

Sun

Yue

, et al. HeightFormer: learning height prediction in voxel features for roadside vision centric 3D object detection via transformer. ArXiv. Epub ahead of print 17 March 2025. DOI: 10.48550/arXiv.2503.10777.

35.

Hua

, et al. Pro3D: roadside monocular 3D detection prompted by 2D detection. ArXiv. Epub ahead of print 1 April 2024. DOI: 10.48550/arXiv.2404.01064.