Sage Journals: Discover world-class research

Abstract

Autonomous combat unmanned aerial vehicle (UAV) systems represent a critical area of research for global military powers. This study investigates one-on-one close-range air combat scenarios involving UAVs and proposes an autonomous maneuvering decision-making model based on deep reinforcement learning (DRL). A six-degree-of-freedom continuous action space is established to simulate UAV combat environments realistically. In the proposed decision-making model, a global reward function is primarily designed based on the combat outcome and a guidance reward that incorporates four key tactical dimensions: attack angle, distance, velocity, and altitude. In addition, we design a value-based prioritized experience replay (PER) mechanism to improve sample efficiency by adaptively balancing old and new experiences, thereby accelerating convergence. Finally, a three-dimensional air combat simulation environment is developed. Experimental results demonstrate that the proposed model achieves strong convergence and practical effectiveness in autonomous air combat decision-making, significantly outperforming baseline methods in terms of tactical adaptability and mission success rates.

Keywords

UAV deep reinforcement learning intelligent air combat maneuvering decision-making prioritized experience replay

Get full access to this article

View all access options for this article.

References

Bai

Song

Liang

, et al. (2022) UAV maneuvering decision-making algorithm based on twin delayed deep deterministic policy gradient algorithm. Journal of Artificial Intelligence and Technology 2(1): 16–22.

Brunke

Greeff

Hall

, et al. (2022) Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems 5(1): 411–444.

Cao

Kou

, et al. (2023) Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. International Journal of Aerospace Engineering 1: 1–20.

Chrouta

Fethi

Abderrahmen

(2021) A modified multi swarm particle swarm optimization algorithm using an adaptive factor selection strategy. Transactions of the Institute of Measurement and Control 1: 1–10.

Fang

(2003) Aircraft flight dynamics. Beijing, China: Beijing University of Aeronautics and Astronautics Press.

Guang

Kun

, et al. (2024) UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience. Journal of Systems Engineering and Electronics 35(3): 644–665.

Jin

Zhang

, et al. (2025) Cross-platform mission planning for UAVs under carrier delivery mode. Defence Technology 53: 76–97.

Jing

Cong

Huang

, et al. (2024) Autonomous Maneuvering decision-making algorithm for unmanned aerial vehicles based on node clustering and deep deterministic policy gradient. Aerospace 11(12): 1–29.

Kong

Zhou

Yang

, et al. (2020) UAV autonomous aerial combat maneuver strategy generation with observation error based on state-adversarial deep deterministic policy gradient and inverse reinforcement learning. Electronics 9(7): 1121–1145.

10.

Kong

Zhou

Zhao

, et al. (2022) Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play. Control Theory and Applications 39(2): 352–362.

11.

Huang

Bai

, et al. (2023) Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning. CAAI Transactions on Intelligence Technology 8(1): 64–81.

12.

Liang

Tian

, et al. (2019) Intelligent aircraft maneuvering decision based on CNN. In: 2019 Proceedings of the 3rd International Conference on Computer Science and Application Engineering, Shanghai, China, 22–24 October, pp. 1–5. Piscataway, NJ: IEEE Press.

13.

Wan

(2020) Research on autonomous decision-making method of unmanned aerial vehicle combat based on knowledge base. Computer Measurement and Control 28(7): 158–161.

14.

Shi

Jiang

, et al. (2022) Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm. Defence Technology 18(9): 1697–1714.

15.

Yuan

Cheng

, et al. (2024) Predictive air combat decision model with segmented reward allocation. Complex and Intelligent Systems 10(6): 7513–7530.

16.

Luo

Ding

Tan

, et al. (2025) A review of autonomous maneuver decision-making methods for unmanned combat aerial vehicles. Acta Aeronautica Et Astronautica Sinica 46(07): 30–59.

17.

Song

(2014) A case study on air combat decision using approximated dynamic programming. Mathematical Problems in Engineering 2014: 1–10.

18.

Park

Choi

, et al. (2021) An expert data-driven air combat maneuver model learning approach. In: 2021 AIAA Science and Technology Forum and Exposition, Nashville, Tennessee, 11–15 January, p.0526. Reston, VA: AIAA.

19.

Pope

Ide

Micovic

, et al. (2021) Hierarchical reinforcement learning for air-to-air combat. In: 2021 International Conference on Unmanned Aircraft Systems, Guangdong Province, Shenzhen, 25–27 March, pp. 275-284. Piscataway: IEEE.

20.

Silver

Scihrittwieser

Simonyan

, et al. (2017) Mastering the game of go without human knowledge. Nature 550(7676): 354–359.

21.

Sun

Qiu

Wang

, et al. (2024) Autonomous UAV maneuvering decisions by refining opponent strategies. IEEE Transactions on Aerospace and Electronic Systems 60(3): 3454–3467.

22.

Von

Garcia

Casbeer

, et al. (2020) Multiple-pursuer, single-evader border defense differential game. Journal of Aerospace Information Systems 17(8): 407–416.

23.

Wang

, et al. (2024) Deep reinforcement learning-based air combat maneuver decision-making: Literature review, implementation tutorial and future direction. Artificial Intelligence Review 57(1): 1–38.

24.

Wang

(2021) Research on Intelligent Maneuver Decision Generation of within Visual Range Air Combat. Chengdu, China: Sichuan University.

25.

Wang

Zhang

(2021) Expert system optimization method based on tempting maneuver algorithm. Aircraft Design 41(4): 15–19.

26.

Yan

Shi

Zhong

(2019) Task assignment for multiplayer reach–avoid games in convex domains via analytical barriers. IEEE Transactions on Robotics 36(1): 107–124.

27.

Yang

Zhang

Shi

, et al. (2019) Maneuver decision of UAV in short-range air combat based on deep reinforcement learning. IEEE Access 8: 363–378.

28.

Yao

Wei

, et al. (2023) Research on UAV air combat maneuvering decision-making based on Q-learning algorithm. In: 2022 IEEE 5th International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, 23–25 September.

29.

Yin

Wang

, et al. (2022) Research on air combat behavior decision-making method based on improved DDPG. Command Control and Simulation 44(1): 97–102.

30.

Lin

Wang

, et al. (2025) Carrier platform-enhanced multiple-UAV cooperative task assignment with dual heterogeneities. Artificial Intelligence Review 58(8): 248.

31.

Zhang

Wei

Zhou

, et al. (2022) Maneuver decision-making for autonomous air combat based on FRE-PPO. Applied Sciences 12(20): 1–18.

32.

Zhang

Dong

Yin

, et al. (2024) Research on autonomous air combat maneuvering decision-making for UAVs based on an Improved PPO algorithm. Journal of Air Force Engineering University 25(6): 77–86.

33.

Zhao

Tao

, et al. (2021) Multi-UAV air combat strategy based on QPSO and fuzzy game theory. Ordnance Industry Automation 40(5): 14–17.

34.

Zhou

Huang

Zhang

, et al. (2023) Research on UAV intelligent air combat decision and simulation based on deep reinforcement learning. Acta Aeronautica Et Astronautica Sinica 44(4): 99–112.

35.

Zhou

Wang

Fan

, et al. (2024) Collaborative Maneuvering decision based on multi-layer influence diagram group decision-making. Electronic Engineering and Informatics 51: 345–357.

36.

Zhou

Huang

Zhu

, et al. (2022) Intelligent air combat maneuvering decision based on TD3 algorithm. In: 2022 International Conference on Autonomous Unmanned Systems, Shanxi Province, Xi’an, 23–25 September, pp. 1082–1094. Singapore: Springer.

Research on UAV air combat maneuvering decision-making based on deep reinforcement learning

Abstract

Keywords

Get full access to this article

References