Abstract
An improved Deep Deterministic Policy Gradient (DDPG) algorithm, integrating self-attention mechanisms, adversarial training, and prioritized experience replay, is proposed to address the limitations of state representation capability, sample efficiency, and algorithm robustness in the DDPG algorithm under complex dynamic environments. First, a multihead self-attention module is constructed within the Critic network to enhance the network’s spatial modeling capability for complex environments by parallel computing multidimensional state-action association features. This improves the accuracy of Q-value estimation and the stability of training. Second, an adversarial training mechanism is introduced, where adversarial samples are proportionally mixed into the training data, enhancing the algorithm’s adaptability to state disturbances. Meanwhile, a prioritized experience replay pool is designed based on the SumTree structure, improving the reuse efficiency of high-value samples. Finally, dynamic and static scenarios are built on the Gazebo simulation platform, and real-world experiments are conducted. The results show that, compared with the original DDPG algorithm, the improved algorithm achieves a 55.3% faster convergence speed in dynamic scenarios, an 80.2% reduction in Critic loss, and stronger generalization ability across different scenarios. Real-world tests further validate the superiority of the algorithm in dynamic obstacle avoidance and trajectory smoothness.
Get full access to this article
View all access options for this article.
