Lane-changing policy offline reinforcement learning of autonomous vehicles based on BEAR algorithm with support set constraints

Abstract

Imitation learning struggles to learn an optimal policy from datasets containing both expert and non-expert samples due to its inability to discern the quality differences between these samples. Furthermore, standard online reinforcement learning (RL) methodologies face significant exploration costs and safety risks during environmental interactions. Addressing these challenges, this study develops a lane-changing model for autonomous vehicles using the bootstrapping error accumulation reduction (BEAR) algorithm. The model initially examines the distributional shifts between behavioral and target policies in offline RL. It then incorporates the BEAR algorithm, enhanced with support set constraints, to mitigate this issue. The study subsequently proposes a lane-changing policy learning method based on the BEAR algorithm in offline RL. This method involves designing the state space, action set, and reward function. The reward function is tailored to guide the autonomous vehicle in executing lane changes while balancing safety, ride comfort, and traffic efficiency. In the final stage, the lane-changing policy is learned using a dataset of both expert and non-expert samples. Test results indicate that the lane-changing policy developed through this method shows higher success rates and safety levels compared to policies derived via imitation learning.

Keywords

Autonomous vehicles offline reinforcement learning lane-changing policy expert sample distributional shift

Get full access to this article

View all access options for this article.

References

Wei

Wang

. Combining decision making and trajectory planning for lane changing using deep reinforcement learning. IEEE Trans Intell Transp Syst 2022; 23(9): 16110–16136.

Zhan

Wang

Ding

, et al. Three-way behavioral decision making with hesitant fuzzy information systems: survey and challenges. IEEE CAA J Autom Sin 2022; 10(2): 330–350.

Gao

Yan

Gao

, et al. Driver-like decision-making method for vehicle longitudinal autonomous driving based on deep reinforcement learning. Proc IMechE, Part D: J Automobile Engineering 2022; 236(13): 3060–3070.

Zhang

Wang

, et al. A learning-based discretionary lane-change decision-making model with driving style awareness. IEEE Trans Intell Transp Syst 2022; 24(1): 68–78.

Talebpour

Mahmassani

Hamdar

. Modeling lane-changing behavior in a connected environment: a game theory approach. Transp Res Procedia 2015; 7: 420–440.

Guo

Harmati

. Lane-changing decision modelling in congested traffic with a game theory-based decomposition algorithm. Eng Appl Artif Intell 2022; 107: 104530.

Peng

Zhang

Zhou

, et al. An integrated model for autonomous speed and lane change decision-making based on deep reinforcement learning. IEEE Trans Intell Transp Syst 2022; 23(11): 21848–21860.

Tian

Wei

Jiang

, et al. Personalized lane change planning and control by imitation learning from drivers. IEEE Trans Ind Electron 2022; 70(4): 3995–4006.

Liu

Müller

. Reliability of deep neural networks for an end-to-end imitation learning-based lane keeping. IEEE Trans Intell Transp Syst 2023; 24(12): 13768–13786.

10.

Gong

, et al. Beyond imitation: a life-long policy learning framework for path tracking control of autonomous driving. IEEE Trans Veh Technol. Epub ahead of print 1 April 2024. DOI: 10.1109/TVT.2024.3382309.

11.

Samak

Kandhasamy

. Robust behavioral cloning for autonomous vehicles using end-to-end imitation learning. arXiv preprint arXiv:2010.04767, 2020.

12.

Mnih

Kavukcuoglu

Silver

, et al. Human-level control through deep reinforcement learning. Nature 2015; 518(7540): 529–533.

13.

Lou

Yang

, et al. Robust decision making for autonomous vehicles at highway on-ramps: a constrained adversarial reinforcement learning approach. IEEE Trans Intell Transp Syst 2022; 24(4): 4103–4113.

14.

Wang

, et al. End-to-End automated lane-change maneuvering considering driving style using a deep deterministic policy gradient algorithm. Sensors 2020; 20(18): 5443.

15.

Fujimoto

Hoof

Meger

. Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th international conference on machine learning, Stockholm, Sweden, 10–15 July 2018, pp.1587–1596. Carlsbad, CA: PMLR, International Machine Learning Society (IMLS).

16.

Zhang

Fan

, et al. Twin delayed deep deterministic policy gradient-based deep reinforcement learning for energy management of fuel cell vehicle integrating durability information of powertrain. Energy Convers Manag 2022; 274: 116454.

17.

Liu

Zhao

Niu

, et al. Reinforcement driving: exploring trajectories and navigation for autonomous vehicles. IEEE Trans Intell Transp Syst 2019; 22(2): 808–820.

18.

Jin

Liu

, et al. Policy-based deep reinforcement learning for visual servoing control of mobile robots with visibility constraints. IEEE Trans Ind Electron 2021; 69(2): 1898–1908.

19.

Niu

Wang

, et al. Energy management optimization for connected hybrid electric vehicle using offline reinforcement learning. J Energy Storage 2023; 72: 108517.

20.

Wang

Yang

Chen

, et al. Anti-bandit for neural architecture search. Int J Comput Vis 2023; 131(10): 2682–2698.

21.

Chen

Xie

, et al. A multi-context aware human mobility prediction model based on motif-preserving travel preference learning. IEEE Trans Intell Transp Syst 2024; 25(2): 2139–2152.

22.

Hsu

Ren

Nguyen

, et al. Sim-to-lab-to-real: safe reinforcement learning with shielding and generalization guarantees. Artif Intell 2023; 314: 103811.

23.

Dey

Mujumdar

Dasgupta

, et al. Adaptive safety shields for reinforcement learning-based cell shaping. IEEE Trans Netw Serv Manag 2022; 19(4): 5034–5043.

24.

. A deployment-efficient energy management strategy for connected hybrid electric vehicle based on offline reinforcement learning. IEEE Trans Ind Electron 2021; 69(9): 9644–9654.

25.

Yao

Yoon

Hong

. Control of hybrid electric vehicle powertrain using offline-online hybrid reinforcement learning. Energies 2023; 16(2): 652.

26.

Diehl

Sievernich

Krüger

, et al. Uncertainty-aware model-based offline reinforcement learning for automated driving. IEEE Robot Autom Lett 2023; 8(2): 1167–1174.

27.

Fang

Zhang

Gao

, et al. Offline reinforcement learning for autonomous driving with real world driving data. In: 2022 IEEE 25th international conference on intelligent transportation systems (ITSC), Macau, China, 8–12 October 2022, pp.3417–3422. New York: IEEE.

28.

Levine

Kumar

Tucker

, et al. Offline reinforcement learning: tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.

29.

Fujimoto

Meger

Precup

. Off-policy deep reinforcement learning without exploration. In: International conference on machine learning, PMLR, Long Beach, CA, USA, 9–15 June 2019, pp.2052–2062. Carlsbad, CA: PMLR, International Machine Learning Society (IMLS).

30.

Kumar

Tucker

, et al. Stabilizing off-policy Q-learning via bootstrapping error reduction. In: Proceedings of the 33rd international conference on neural information processing systems, Vancouver, BC, Canada, 8–14 December 2019, pp.11784–11794. Red Hook, NY: Curran Associates, Inc.

31.

Sayed

Brown

Navin

. Simulation of traffic conflicts at unsignalized intersections with TSC-Sim. Accid Anal Prev 1994; 26(5): 593–607.

32.

Treiber

Kesting

Helbing

. Delays, inaccuracies and anticipation in microscopic traffic models. Physica A 2006; 360(1): 71–88.

33.

U.S. Department of Transportation. Next Generation Simulation (NGSIM) vehicle trajectories and supporting data, https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Vehicle-Trajector/8ect-6jqj/about_data (2022, accessed 21 July 2024).

34.

Olson

Sivak

. Perception-response time to unexpected roadway hazards. Hum Factors 1986; 28(1): 91–96.

35.

Wang

Zhang

, et al. The effect of headway variation tendency on traffic flow: modeling and stabilization. Physica A 2019; 525: 566–575.