Sage Journals: Discover world-class research

Abstract

Autonomous agents have been the center of discussion for the future concept of operations. Reinforcement learning (RL) is the core machine learning area for developing those intelligent agents, particularly in complex and dynamic environments such as battlefields and afflicted areas. This study proposes the large language model (LLM)-based RL system to utilize the power of LLMs for military RL applications. Users can use the system through prompts, and three different types of prompting are tested with the weapon selection scenario. The proposed system helps and guides users not only in building an RL agent (optimal policy) quickly but also in providing related theories and other information. In comparison to the human-designed RL system, the proposed system also had some limitations, such as reproducibility and reliability. This study discussed and suggested some remedies for the limitations.

Keywords

Large language model reinforcement learning autonomous agents military decision support weapon selection

Get full access to this article

View all access options for this article.

References

Military decision support with actor and critic reinforcement learning agents. Def Sci J 2024; 74: 389–398.

Minaee

Mikolov

Nikzad

, et al. Large language models: a survey. arXiv 2024, https://arxiv.org/pdf/2402.06196

Zhao

Zhou

, et al. A survey of large language models. arXiv 2023, https://arxiv.org/pdf/2303.18223

Sutton

Barto

AG.

Reinforcement learning: an introduction. 2nd ed. Cambridge, MA: MIT Press, 2018.

Azar

Koubaa

Mohamed

, et al. Drone deep reinforcement learning: a review. Electronics 2021; 10: 999.

Koch

Mancuso

West

, et al. Reinforcement learning for UAV attitude control. ACM Trans Cyber Phys Syst 2019; 3: 1–21.

Gai

Zhong

, et al. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning. Appl Soft Comput 2020; 89: 106099.

Aouf

Whidborne

, et al. Deep reinforcement learning based local planner for UAV obstacle avoidance using demonstration data. arXiv 2020, https://arxiv.org/pdf/2008.02521

Wei

Wang

Zheng

, et al. UGV navigation optimization aided by reinforcement learning-based path tracking. IEEE Access 2018; 6: 57814–57825.

10.

Chu

Sun

Zhu

, et al. Motion control of unmanned underwater vehicles via deep imitation reinforcement learning algorithm. IET Intell Transp Syst 2020; 14: 764–774.

11.

Adawadkar

AMK

Kulkarni

. Cyber-security and reinforcement learning—a brief survey. Eng Appl Artif Intell 2022; 114: 105116.

12.

Sewak

Sahay

Rathore

DRLDO a novel DRL based de obfuscation system for defence against metamorphic malware. Def Sci J 2021; 71: 55–65.

13.

Tao

Han

Deep-reinforcement-learning-based intrusion detection in aerial computing networks. IEEE Netw 2021; 35: 66–72.

14.

Heartfield

Loukas

Bezemskij

, et al. Self-configurable cyber-physical intrusion detection for smart homes using reinforcement learning. IEEE Trans Inf Forensics Secur 2021; 16: 1720–1735.

15.

Ferdowsi

Saad

Deep learning for signal authentication and security in massive Internet-of-Things systems. IEEE Trans Wirel Commun 2019; 67: 1371–1387.

16.

Xiao

, et al. Reinforcement learning based PHY authentication for VANETs. IEEE Trans Veh Technol 2020; 69: 3068–3079.

17.

Boron

Darken

. Developing combat behavior through reinforcement learning in war-games and simulations. In: Proceedings of 2020 IEEE conference on games (CoG), Osaka, Japan, 24–27 August 2020, pp. 728–731. New York: IEEE.

18.

Zhang

Gold

, et al. Air dominance through ma-chine learning: a preliminary exploration of artificial intelligence–assisted mission planning. Santa Monica, CA: RAND Corporation, 2020.

19.

Soleyman

Khosla

Multi-agent mission planning with reinforcement learning. In: Proceedings of AAAI symposium on the 2nd work-shop on deep models and artificial intelligence for defense applications: potentials, theories, practices, tools, and risks, virtual, 11–12 November 2020.

20.

Fan

Song

, et al. Alpha C2–an intelligent air defense commander in-dependent of human decision-making. IEEE Access 2020; 8: 87504–87516.

21.

Goecks

Waytowich

Asher

, et al. On games and simulators as a platform for development of artificial intelligence for command and control. J Def Model Simul 2022; 20: 495–508.

22.

You

Diao

Gao

Deep reinforcement learning for target searching in cognitive electronic warfare. IEEE Access 2019; 7: 37432–37447.

23.

Hua

Fan

, et al. War and peace (WarAgent): large language model-based multi-agent simulation of world wars. arXiv 2024, https://arxiv.org/pdf/2311.17227

24.

Goecks

Waytowich

COA-GPT: generative pre-trained transformers for accelerated course of action development in military operations. arXiv 2024, https://arxiv.org/pdf/2402.01786

25.

Goecks

Waytowich

DisasterResponseGPT: large language models for accelerated plan of action development in disaster response scenarios. arXiv 2023, https://arxiv.org/pdf/2306.17271

26.

Wei

Wang

Schuurmans

, et al. Chain-of-thought prompting elicits reasoning in large language models. arXiv 2023, https://arxiv.org/pdf/2201.11903

27.

Yao

Zhao

, et al. Tree of thoughts: deliberate problem solving with large language models. arXiv 2023, https://arxiv.org/pdf/2305.10601

28.

OpenAI. Best practices for prompt engineering with the OpenAI API, https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api (accessed 20 February 2024).

29.

Hopkins

Renda

Carbin

Can LLMs generate random numbers? Evaluating LLM sampling in controlled domains. In: ICML 2023 workshop: sampling and optimization in discrete space, 2023, https://people.csail.mit.edu/renda/llm-sampling-paper

30.

Chang

Wang

, et al. A survey on evaluation of large language models. ACM Trans Intell Syst Technol 2024; 15: 1–45.

31.

Zhao

Chen

Yang

, et al. Explainability for large language models: a survey. ACM Trans Intell Syst Technol 2024; 15: 1–38.

Military reinforcement learning with large language model–based agents: a case of weapon selection

Abstract

Keywords

Get full access to this article

References