Sage Journals: Discover world-class research

Abstract

Surrounding vehicle trajectory prediction is a crucial component of autonomous driving. Currently, trajectory prediction research relies primarily on publicly available datasets processed by perception methods rather than raw sensor perception information. With the increasing emphasis on visual perception, the integration of the visual perception trajectory prediction pathway will be highly important for the application of prediction algorithms. This paper proposes a multimodal vehicle trajectory prediction model based on visual perception information (VP-MTP). First, a vehicle detection network is employed to obtain the position coordinates of vehicles in consecutive frame bird’s eye view (BEV) images. Subsequently, the discrete position coordinates are processed into complete vehicle historical trajectories through a processing block that includes affine coordinate transformation, vehicle tracking, and trajectory smoothing (ATS). To address the high computational complexity of the standard Transformer, the input sequence is decomposed in the time dimension. Additionally, layer normalization positions are adjusted, convolutional feed-forward layers are introduced, and hierarchical encoding is employed to enhance feature extraction capability and encoding efficiency. Thus, a hierarchical Transformer encoder based on convolutional feedforward with time decomposition attention (HT-CTA) is constructed. Considering the large workload and limited adaptability of clustering-based multimodal training strategies in complex scenarios, learnable anchor embedding features are introduced as model parameters to establish a multimodal trajectory decoder. Finally, experiments on the Waymo motion and nuScenes datasets demonstrate that, compared to existing baseline models, the VP-MTP achieves average improvements of 12.4% and 9.9% in minimum Average Displacement Error (minADE) and minimum Final Displacement Error (minFDE) on the Waymo dataset, and 9.3% and 10.0% on the nuScenes dataset, respectively. This enhancement provides higher prediction accuracy and good multimodality, achieving multimodal trajectory prediction based on raw visual perception information.

Keywords

Autonomous driving trajectory prediction visual information transformer multimodal

Get full access to this article

View all access options for this article.

References

Yurtsever

Lambert

Carballo

, et al. A survey of autonomous driving: common practices and emerging technologies. IEEE Access 2020; 8: 58443–58469.

Hoss

Scholtes

Eckstein

A review of testing object-based environment perception for safe automated driving. Automot Innov 2022; 5(3): 223–250.

Zhang

Corn cash price forecasting with neural networks. Comput Electron Agric 2021; 184: 106120.

Jin

Price forecasting through neural networks for crude oil, heating oil, and natural gas. Measurement 2024; 1(1): 100001.

Huang

Yang

, et al. A survey on trajectory-prediction methods for autonomous driving. IEEE Trans Intell Vehicles 2022; 7(3): 652–674.

Ettinger

Cheng

Caine

, et al. Large scale interactive motion forecasting for autonomous driving: the Waymo open motion dataset. In: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp.9710–9719.

Chang

Lambert

Sangkloy

, et al. Argoverse: 3D tracking and forecasting with rich maps. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp.8748–8757.

Coifman

A critical evaluation of the next generation simulation (NGSIM) vehicle trajectory dataset. Transp Res Part B: Methodol 2017; 105: 362–377.

Zhao

Gao

Lan

, et al. TNT: Target-driven trajectory prediction. In: Conference on Robot learning, PMLR, 2021, pp.895–904.

10.

Gilles

Sabatini

Tsishkou

, et al. Thomas: trajectory heatmap output with learned multi-agent sampling. arXiv preprint arXiv:2110.06607, 2021.

11.

Sun

Zhao

DenseTNT: End-to-end trajectory prediction from dense goal sets. In: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp.15303–15312.

12.

Xie

Shangguan

Fei

, et al. Motion trajectory prediction based on a CNN-LSTM sequential model. Sci China Inf Sci 2020; 63(11): 1–21.

13.

Messaoud

Yahiaoui

Verroust-Blondet

, et al. Attention based vehicle trajectory prediction. IEEE Trans Intell Vehicles 2020; 6(1): 175–185.

14.

Konev

Brodt

Sanakoyeu

MotionCNN: a strong baseline for motion prediction in autonomous driving. arXiv preprint arXiv:2206.02163, 2022.

15.

Cui

Radosavljevic

Chou

, et al. Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In: 2019 International conference on robotics and automation (ICRA), 2019, pp.2090–2096. IEEE

16.

Chai

Sapp

Bansal

, et al. Multipath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449, 2019.

17.

Hong

Sapp

Philbin

. Rules of the road: predicting driving behavior with a convolutional model of semantic interactions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp.8454–8462.

18.

Sharma

Sistu

Yahiaoui

, et al. Navigating uncertainty: the role of short-term trajectory prediction in autonomous vehicle safety. arXiv preprint arXiv:2307.05288, 2023.

19.

Shi

Jiang

Dai

, et al. Motion transformer with global intention localization and local movement refinement. Adv Neural Inf Process Syst 2022; 35: 6531–6543.

20.

Liu

Zhang

Fang

, et al. Multimodal motion prediction with stacked transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp.7577–7586.

21.

Yuan

Weng

, et al. Agentformer: agent-aware transformers for socio-temporal multi-agent forecasting. In: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp.9813–9823.

22.

Zhou

Wang

, et al. HIVT: Hierarchical vector transformer for multi-agent motion prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp.8823–8833.

23.

Ngiam

Caine

Vasudevan

, et al. Scene transformer: a unified architecture for predicting multiple agent trajectories. arXiv preprint arXiv:2106.08417, 2021.

24.

Huang

. Multi-modal motion prediction with transformer-based neural network for autonomous driving. In: 2022 International conference on robotics and automation (ICRA), 2022, pp.2605–2611. IEEE.

25.

Gao

Sun

Zhao

, et al. VectorNet: encoding HD maps and agent dynamics from vectorized representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp.11525–11533.

26.

Huang

. ReCoAt: a deep learning-based framework for multi-modal motion prediction in autonomous driving application. In: 2022 IEEE 25th international conference on intelligent transportation systems (ITSC), 2022, pp.988–993. IEEE.

27.

Varadarajan

Hefny

Srivastava

, et al. Multipath++: efficient information fusion and trajectory aggregation for behavior prediction. In: 2022 International conference on robotics and automation (ICRA), 2022, pp.7814–7821. IEEE

28.

Nayakanti

Al-Rfou

Zhou

, et al. Wayformer: motion forecasting via simple & efficient attention networks. In: 2023 IEEE international conference on robotics and automation (ICRA), 2023, pp.2980–2987. IEEE

Multimodal vehicle trajectory prediction method based on visual perception information

Abstract

Keywords

Get full access to this article

References