Sage Journals: Discover world-class research

Abstract

The design of a universal intelligent perception architecture enabling real-time environmental awareness is critical for partially observable multi-unmanned vehicle systems. Traditional optimization and game-theoretic approaches struggle in dynamic environments due to their reliance on precise modeling, while existing multi-agent reinforcement learning (RL) methods under partial observability face challenges in convergence efficiency and policy robustness. This paper proposes HER-MATD3, a framework integrating Hindsight Experience Replay with Multi-Agent Twin Delayed Deep Deterministic Policy Gradient, to address these issues. The architecture incorporates a perception module that injects measurement noise to emulate sensor uncertainty under communication constraints, constructing a real-time situational representation to support robust cooperative evasion decisions. Reusing failed samples, HER-MATD3 improves the utilization of experience and mitigates sparse reward learning bottlenecks. Simulation results in a pursuit–evasion game demonstrate a 45.6% reduction in convergence time, a 28% increase in post-convergence average reward, and a 65.8% decrease in reward variance compared to MATD3. Physical experiments further show that the learned policy remains executable under noisy perception and restricted communication, supporting practical deployment for cooperative multi-vehicle evasion tasks.

Keywords

cooperative evasion of unmanned vehicles partial observability reinforcement learning HER-MATD3 multi-agent

Get full access to this article

View all access options for this article.

References

Alexopoulos

Schmidt

Badreddin

(2014) A pursuit–evasion game between unmanned aerial vehicles. In: Proceedings of the 11th international conference on informatics in control, automation and robotics (ICINCO 2014), Vienna, Austria, 2–4 September 2014, pp. 74–81. SciTePress.

Asgharnia

Schwartz

Atia

(2022) Hierarchical reinforcement learning with multi discount factors in a differential game. In: Proceedings of the 2022 IEEE symposium series on computational intelligence (SSCI 2022), Singapore, 4–7 December 2022, pp. 686–693. IEEE. https://doi.org/10.1109/SSCI51031.2022.10022098

Attarchi

Bidabad

Rezaii

(2010) A novel navigation method for pursuing a moving target. Iranian Journal of Science and Technology, Transactions A: Science 34(4): 313–320.

Wang

(2025) A constrained reinforcement learning based approach for cooperative control of multi-UAV in dense obstacle environments. Science China Technological Sciences 69: 1120601. https://doi.org/10.1007/s11431-025-3076-2

Wang

, et al. (2024) Multi unmanned vehicle cooperative encirclement control based on bidirectional long short-term memory and mixed reward functions. Scientia Sinica Technologica 54(9): 1665–1675.

Doshi

Banerjee

(2024) Modeling and reinforcement learning in partially observable many-agent systems. Autonomous Agents and Multi-Agent Systems 38: 12. https://doi.org/10.1007/s10458-024-09640-1

Lin

Sun

, et al. (2025a) Intelligent fault-tolerant control for high-speed maglev transportation based on error-driven adaptive fuzzy online compensator. IEEE Transactions on Intelligent Transportation Systems 26: 17814–17823. https://doi.org/10.1109/TITS.2025.3549624

Luo

, et al. (2025b) Multiple models-based fault tolerant control of levitation module of maglev vehicles against partial actuator failures. IEEE Transactions on Vehicular Technology 74(2): 2231–2240. https://doi.org/10.1109/TVT.2024.3399235

Liang

Wang

Luo

(2021) Collaborative pursuit–evasion of air–ground system in a complex 3D polyhedral map. Control Theory and Applications 38(5): 623–633.

10.

Huang

Chen

, et al. (2020) A two-level memetic path planning algorithm for unmanned air/ground vehicle cooperative detection systems. In: Proceedings of the 2020 IEEE international conference on advanced robotics and mechatronics (ICARM 2020), Shenzhen, China, 18–21 December 2020, pp. 25–30. IEEE. https://doi.org/10.1109/ICARM49381.2020.9195287

11.

Parsons

(1978) Pursuit–evasion in a graph. In: Alavi

Lick

(eds) Theory and Applications of Graphs. Lecture Notes in Mathematics. Springer, Vol. 642, pp. 426–441. https://doi.org/10.1007/bfb0070400

12.

Zhang

Guo

, et al. (2020) A deep reinforcement learning approach for the pursuit–evasion game in the presence of obstacles. In: Proceedings of the 2020 IEEE international conference on real-time computing and robotics (RCAR 2020), Asahikawa, Japan, 28–29 September 2020, pp. 68–73. IEEE. https://doi.org/10.1109/RCAR49640.2020.9303044

13.

Rakhmanov

Ibragimov

Ferrara

(2016) Linear pursuit differential game under phase constraint on the state of evader. Discrete Dynamics in Nature and Society 2016: 1–8. https://doi.org/10.1155/2016/1289456

14.

Ramana

Kothari

(2017) Pursuit–evasion games of high-speed evader. Journal of Intelligent and Robotic Systems 85: 293–306. https://doi.org/10.1007/s10846-016-0379-3

15.

Ruiz

Murrieta-Cid

(2016) A differential pursuit/evasion game of capture between an omnidirectional agent and a differential drive robot, and their winning roles. International Journal of Control 89(11): 2169–2184. https://doi.org/10.1080/00207179.2016.1151078

16.

Sani

Robu

Hably

, et al. (2021) Limited information model predictive control for pursuit–evasion games. In: Proceedings of the 60th IEEE conference on decision and control (CDC 2021), Austin, TX, USA, 14–17 December 2021, pp. 265–270. IEEE.

17.

Singh

Pratik

, et al. (2023) UAV and UGV assisted path planning for sensor data collection in precision agriculture In: Proceedings of the 11th international symposium on electronic systems devices and computing (ESDC 2023), Sri City, India, 4–6 May 2023. pp. 1–6. IEEE. https://doi.org/10.1109/ESDC56251.2023.10149861

18.

Sun

, et al. (2018) Differential game strategy in three-player evasion and pursuit scenarios. Journal of Systems Engineering and Electronics 29(2): 352–366. https://doi.org/10.21629/JSEE.2018.02.16

19.

Tan

Wang

(2020) Research on the application of reinforcement learning in multi-agent confrontation. Master’s Thesis, China Academy of Launch Vehicle Technology, Beijing, China.

20.

Tian

Gao

, et al. (2020) Vcash: a novel reputation framework for identifying denial of traffic service in internet of connected vehicles. IEEE Internet of Things Journal 7(5): 3901–3909. https://doi.org/10.1109/JIOT.2019.2951620

21.

Wang

Dong

Sun

(2020) Cooperative control for multi-player pursuit–evasion games with reinforcement learning. Neurocomputing 412: 101–114. https://doi.org/10.1016/j.neucom.2020.06.031

22.

Wei

(2023) A hybrid algorithm for collision-free navigation of a non-holonomic ground robot in dynamic environments with steady and moving obstacles. In: Proceedings of the 42nd Chinese control conference (CCC 2023), Tianjin, China, 26–28 July 2023, pp. 2928–2933. IEEE. https://doi.org/10.23919/CCC58697.2023.10241121

23.

Xiao

Tan

Hoffman

, et al. (2025) Asynchronous multi-agent deep reinforcement learning under partial observability. The International Journal of Robotics Research 44: 1257–1286. https://doi.org/10.1177/02783649241306124

24.

Zhao

Zhou

Liu

, et al. (2019) Three-body differential game approach of pursuit–evasion–defense in three-dimensional space. Systems Engineering and Electronics 41(2): 322–335.

25.

Zhou

Cheng

Wang

, et al. (2019) Input-delay satellite optimal tracking control based on differential games. In: Proceedings of the 38th Chinese control conference (CCC 2019), Guangzhou, China, 27–30 July 2019, pp. 1941–1945. IEEE. https://doi.org/10.23919/ChiCC.2019.8865394

26.

Zhou

Shi

, et al. (2023) Research on multi-robot formation control based on MATD3. Applied Sciences 13(3): 1874. https://doi.org/10.3390/app13031874

Perception-enhanced cooperative evasion control for multi-unmanned vehicles under partial observability

Abstract

Keywords

Get full access to this article

References