Abstract
With the continuous increase of urban traffic flow, the intelligence of traffic signal control (TSC) has become an important means to improve traffic efficiency. Among them, the deep reinforcement learning (DRL) algorithm Deep Q-Network (DQN) has been successfully applied to the field of TSC. We focus on the problems of complex state representation of existing traffic models, insufficient performance of DQN algorithm when using multilayer perceptron (MLP) as an action network, and over-estimation of Q-value leading to degradation of convergence performance. To mine the potential traffic state information from limited features and to improve the efficiency of the model, we propose a DQN softmax cross-entropy (DQN-SCE) TSC algorithm. First, the model uses the current phase and queue length as the state representation and optimizes the reward function only by the queue length. Second, a multi-head self-attention mechanism is used to fuse the state features. Finally, an improved DRL algorithm DQN-SCE is proposed; that is, we add cross-entropy loss of current actions for the target network and the action network to DQN. The experimental results based on CityFlow show that the TSC algorithm has better performance in the metric of average travel time compared with some traditional methods and reinforcement learning methods. The proposed algorithm still performs well compared with the traditional DQN algorithms and several improved algorithms for DQN.
Get full access to this article
View all access options for this article.
