Sage Journals: Discover world-class research

Abstract

Despite significant advancements in reinforcement learning (RL) technology in recent years, public skepticism regarding its application within the realm of autonomous driving still persists. Ensuring the reliability of decision-making by autonomous driving agents has emerged as a major point of contention and challenge. Previous research shows that even well-trained autonomous driving policy models might make unexpected and hazardous decisions when faced with perceptual uncertainties, which could potentially lead to serious accidents. To tackle this issue, we put forward a Self-Game Robust Reinforcement Learning (SGRRL) method, aiming to ensure that autonomous vehicles maintain robustness and safety in their decision-making processes despite uncertain disturbances. The proposed algorithmic framework consists of two fundamental modules: an aggressive policy model and a safe policy model. Specifically, the aggressive policy model obtains the most unsafe decision conceived in the perturbed state space by fitting the maximized cost of unsafe decisions. Its purpose is to carry out robustness attacks on the safe policy of the RL agent and prompt the agent to output unsafe behavioral decisions. The safe policy model is employed to make the safest driving decisions in the face of interference from the aggressive policy model. Additionally, to guarantee that the agent can output the safest decisions, a loss function involving self-game of the two aforementioned models has been devised. This function is employed to learn the optimal safety policy, pursue the maximum task reward return, and concurrently enhance the robustness of the model. It also keeps the policy and cost in check and confines the cost caused by the attack interference of the aggressive policy model within the preset range. Finally, simulations are conducted to evaluate the proposed technique in the traffic scenario of urban ramp merging under different intensities of uncertainty attacks. The experimental results indicate that our method exhibits remarkable improvements in performance and safety compared to other baseline algorithms.

Keywords

Autonomous driving reinforcement learning behavior decisions traffic safety robust decision making

Get full access to this article

View all access options for this article.

References

Rothfuss

Koenig

Rupenyan

, et al. Meta-learning priors for safe Bayesian optimization. In: Proceedings of the 6th conference on robot learning, 2023, pp.237–265. PMLR.

Roumeliotis

Tselikas

ND.

ChatGPT and open-AI models: a preliminary review. Fut Internet 2023; 15: 192.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. In: Advances in neural information processing systems. New York: Curran Associates, Inc., 2017.

Chen

Chitta

, et al. End-to-end autonomous driving: challenges and frontiers. IEEE Trans Pattern Anal Mach Intell 2024; 32: 1–20.

Coelho

Oliveira

A review of end-to-end autonomous driving in urban environments. IEEE Access 2022; 10: 75296–75311.

Wang

Han

Wang

, et al. A review of intelligent connected vehicle cooperative driving development. Mathematics 2022; 10: 3635.

Tang

, et al. Consensus-based cooperative control for multi-platoon under the connected vehicles environment. IEEE Trans Intell Transp Syst 2019; 20: 2220–2229.

Huang

Trustworthy autonomous driving via defense-aware robust reinforcement learning against worst-case observational perturbations. Transp Res C Emerg Technol 2024; 163: 104632.

Bhattacharyya

Wulfe

Phillips

, et al. Modeling human driving behavior through generative adversarial imitation learning. IEEE Trans Intell Transp Syst 2023; 24: 2874–2887.

10.

Huang

, et al. Fear-neuro-inspired reinforcement learning for safe autonomous driving. IEEE Trans Pattern Anal Mach Intell 2024; 46: 267–279.

11.

Liu

Zhang

Zhong

, et al. Fault-tolerant cooperative driving at highway on-ramps considering communication failure. Transp Res C Emerg Technol 2023; 153: 104227.

12.

Patz

Papelis

Pillat

, et al. A practical approach to robotic design for the DARPA urban challenge. J Field Robot 2008; 25: 528–566.

13.

Ferguson

Baker

Likhachev

, et al. A reasoning framework for autonomous urban driving. In: 2008 IEEE intelligent vehicles symposium, 2008, pp.775–780. IEEE.

14.

Bojarski

Testa

Dworakowski

, et al. End to end learning for self-driving cars. Epub ahead of print 25 April 2016. DOI: 10.48550/arXiv.1604.07316.

15.

Hawke

Shen

Gurau

, et al. Urban driving with conditional imitation learning. In: 2020 IEEE international conference on robotics and automation (ICRA), pp.251–257. IEEE.

16.

Kuefler

Morton

Wheeler

, et al. Imitating driver behavior with generative adversarial networks. Epub ahead of print 24 January 2017. DOI: 10.48550/arXiv.1701.06699.

17.

Kiran

Sobh

Talpaert

, et al. Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst 2022; 23: 4909–4926.

18.

Tang

Zhong

, et al. Uncertainty-aware decision-making for autonomous driving at uncontrolled intersections. IEEE Trans Intell Transp Syst 2023; 24: 9725–9735.

19.

Nageshrao

Tseng

Filev

. Autonomous highway driving using deep reinforcement learning. In: 2019 IEEE international conference on systems, man and cybernetics (SMC), 2019, pp.2326–2331. IEEE.

20.

Alizadeh

Moghadam

Bicer

, et al. Automated lane change decision making using deep reinforcement learning in dynamic and uncertain highway environment. In: 2019 IEEE intelligent transportation systems conference (ITSC), 2019, pp.1399–1404. IEEE.

21.

Duan

Eben

Li S

Guan

, et al. Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data. IET Intell Transp Syst 2020; 14: 297–305.

22.

Zhang

Ogai

, et al. Tactical decision-making for autonomous driving using dueling double deep Q network with double attention. IEEE Access 2021; 9: 151983–151992.

23.

Huegle

Kalweit

Mirchevska

, et al. Dynamic input for deep reinforcement learning in autonomous driving. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), 2019, pp.7566–7573. IEEE.

24.

Zhang

Ding

, et al. Path planning via an improved DQN-based learning policy. IEEE Access 2019; 7: 67319–67330.

25.

Zhao

, et al. End-to-End Autonomous Driving Algorithm Based on PPO and Its Implementation. In 2024 IEEE 13th Data Driven Control and Learning Systems Conference (DDCLS), 2024, pp. 1852–1858. IEEE.

26.

Chen

Wei

Ren

, et al. Automatic overtaking on two-way roads with vehicle interactions based on proximal policy optimization. In 2021 IEEE Intelligent Vehicles Symposium (IV), 2021, pp. 1057–1064. IEEE.

27.

Cheng

, et al. Automated lane change strategy using proximal policy optimization-based deep reinforcement learning. In 2020 IEEE Intelligent Vehicles Symposium (IV), 2020, pp. 1746–1752. IEEE.

28.

Czarnecki

Urban driving with multi-objective deep reinforcement learning. arXiv.org, https://arxiv.org/abs/1811.08586v2 (2018).

29.

Chen

Hajidavalloo

, et al. Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic. IEEE Trans Intell Transp Syst 2023; 24: 11623–11638.

30.

Mirchevska

Pek

Werling

, et al. High-level decision making for safe and reasonable autonomous lane changing using reinforcement learning. In: 2018 21st international conference on intelligent transportation systems (ITSC), 2018, pp. 2156–2162. IEEE.

31.

Wen

Duan

, et al. Safe reinforcement learning for autonomous vehicles through parallel constrained policy optimization. In: 2020 IEEE 23rd international conference on intelligent transportation systems (ITSC), 2020. New York: IEEE.

32.

Kamran

Lopez

Lauer

, et al. Risk-aware high-level decisions for automated driving at occluded intersections with reinforcement learning. In: 2020 IEEE Intelligent Vehicles Symposium (IV), 2020, pp. 1205-1212. IEEE.

33.

Cao

Yang

, et al. Highway exiting planner for automated vehicles using reinforcement learning. IEEE Trans Intell Transp Syst 2021; 22: 990–1000.

34.

Peng

Zhou

Efficient learning of safe driving policy via human-AI Copilot optimization. arXiv.org, https://arxiv.org/abs/2202.10341v1 (2022, accessed 7 September 2024).

35.

Zhou

Cao

Huang

, et al. Hybrid lane change strategy of autonomous vehicles based on SOAR cognitive architecture and deep reinforcement learning. Neurocomputing 2024; 611: 128669.

36.

Schott

Delas

Hajri

, et al. Robust deep reinforcement learning through adversarial attacks and training: a survey. Epub ahead of print 1 March 2024. DOI: 10.48550/arXiv.2403.00420.

37.

Zhang

Chen

Xiao

, et al. Robust deep reinforcement learning against adversarial perturbations on state observations. In: Advances in neural information processing systems, 2020, 33, 21024–21037.

38.

Lou

Yang

, et al. Robust decision making for autonomous vehicles at highway on-ramps: a constrained adversarial reinforcement learning approach. IEEE Trans Intell Transp Syst 2023; 24: 4103–4113.

39.

Yang

Tang

Qiu

, et al. Towards robust decision-making for autonomous driving on highway. IEEE Trans Veh Technol 2023; 72: 11251–11263.

40.

Haarnoja

Zhou

Abbeel

, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 34th international conference on machine learning, 2018, pp. 1861–1870. PMLR.

41.

Altman

. Constrained Markov decision processes. 2021, Routledge.

42.

Achiam

Held

Tamar

, et al. Constrained policy optimization. In: Proceedings of the 34th international conference on machine learning, 2017, pp. 22–31. PMLR.

43.

Mazalov

. Mathematical game theory and applications. 2014, John Wiley & Sons.

44.

Tessler

Efroni

Mannor

. Action robust reinforcement learning and applications in continuous control. In: Proceedings of the 36th international conference on machine learning, 2019, pp. 6215–6224. PMLR.

45.

Ding

Peng

Zhang

, et al. Penetration effect of connected and automated vehicles on cooperative on-ramp merging. IET Intell Transp Syst 2020; 14: 56–64.

46.

Treiber

Hennecke

Helbing

Congested traffic states in empirical observations and microscopic simulations. Phys Rev E 2000; 62: 1805–1824.

47.

Schulman

Wolski

Dhariwal

, et al. Proximal policy optimization algorithms. Epub ahead of print 28 August 2017. DOI: 10.48550/arXiv.1707.06347.

48.

Tan

, et al. Learning to walk in the real world with minimal human effort. arXiv.org, https://arxiv.org/abs/2002.08550v3 (2020).

Self-game safe driving decision against robust attacks

Abstract

Keywords

Get full access to this article

References