Abstract
Despite significant advancements in reinforcement learning (RL) technology in recent years, public skepticism regarding its application within the realm of autonomous driving still persists. Ensuring the reliability of decision-making by autonomous driving agents has emerged as a major point of contention and challenge. Previous research shows that even well-trained autonomous driving policy models might make unexpected and hazardous decisions when faced with perceptual uncertainties, which could potentially lead to serious accidents. To tackle this issue, we put forward a Self-Game Robust Reinforcement Learning (SGRRL) method, aiming to ensure that autonomous vehicles maintain robustness and safety in their decision-making processes despite uncertain disturbances. The proposed algorithmic framework consists of two fundamental modules: an aggressive policy model and a safe policy model. Specifically, the aggressive policy model obtains the most unsafe decision conceived in the perturbed state space by fitting the maximized cost of unsafe decisions. Its purpose is to carry out robustness attacks on the safe policy of the RL agent and prompt the agent to output unsafe behavioral decisions. The safe policy model is employed to make the safest driving decisions in the face of interference from the aggressive policy model. Additionally, to guarantee that the agent can output the safest decisions, a loss function involving self-game of the two aforementioned models has been devised. This function is employed to learn the optimal safety policy, pursue the maximum task reward return, and concurrently enhance the robustness of the model. It also keeps the policy and cost in check and confines the cost caused by the attack interference of the aggressive policy model within the preset range. Finally, simulations are conducted to evaluate the proposed technique in the traffic scenario of urban ramp merging under different intensities of uncertainty attacks. The experimental results indicate that our method exhibits remarkable improvements in performance and safety compared to other baseline algorithms.
Keywords
Get full access to this article
View all access options for this article.
