Abstract
The design of a universal intelligent perception architecture enabling real-time environmental awareness is critical for partially observable multi-unmanned vehicle systems. Traditional optimization and game-theoretic approaches struggle in dynamic environments due to their reliance on precise modeling, while existing multi-agent reinforcement learning (RL) methods under partial observability face challenges in convergence efficiency and policy robustness. This paper proposes HER-MATD3, a framework integrating Hindsight Experience Replay with Multi-Agent Twin Delayed Deep Deterministic Policy Gradient, to address these issues. The architecture incorporates a perception module that injects measurement noise to emulate sensor uncertainty under communication constraints, constructing a real-time situational representation to support robust cooperative evasion decisions. Reusing failed samples, HER-MATD3 improves the utilization of experience and mitigates sparse reward learning bottlenecks. Simulation results in a pursuit–evasion game demonstrate a 45.6% reduction in convergence time, a 28% increase in post-convergence average reward, and a 65.8% decrease in reward variance compared to MATD3. Physical experiments further show that the learned policy remains executable under noisy perception and restricted communication, supporting practical deployment for cooperative multi-vehicle evasion tasks.
Keywords
Get full access to this article
View all access options for this article.
