Adversarial attacks on reinforcement learning agents for command and control

Abstract

Given the recent impact of deep reinforcement learning in training agents to win complex games such as StarCraft and DoTA (Defense Of The Ancients)—there has been a surge in research for exploiting learning-based techniques for professional wargaming, battlefield simulation, and modeling. Real-time strategy games and simulators have become a valuable resource for operational planning and military research. However, recent work has shown that such learning-based approaches are highly susceptible to adversarial perturbations. In this paper, we investigate the robustness of an agent trained for a command and control task in an environment that is controlled by an active adversary. The C2 agent is trained on custom StarCraft II maps using the state-of-the-art RL algorithms—Asynchronous Advantage Actor Critic (A3C) and proximal policy optimization (PPO). We empirically show that an agent trained using these algorithms is highly susceptible to noise injected by the adversary and investigate the effects these perturbations have on the performance of the trained agent. Our work highlights the urgent need to develop more robust training algorithms especially for critical arenas like the battlefield.

Keywords

Adversarial attacks deep reinforcement learning command and control StarCraft II adversarial robustness

Get full access to this article

View all access options for this article.

References

Blizzard. Starcraft II, https://starcraft2.blizzard.com

Valve. Dota 2, https://www.dota2.com/home

Vinyals

Babuschkin

Czarnecki

, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 2019; 575(7782): 350–354.

Berner

Brockman

Chan

, et al. Dota 2 with large scale deep reinforcement learning. Arxiv Preprint arXiv:191206680, 2019.

Google Deepmind. pysc2, https://github.com/google-deepmind/pysc2

Samvelyan

Rashid

de Witt

, et al. The starcraft multi-agent challenge. Arxiv Preprint arXiv:190204043, 2019.

oxwhirl. smacv2, https://github.com/oxwhirl/smacv2

pydota2. pydota2, https://github.com/pydota2/pydota2

Narayanan

Vindiola

Park

, et al. First-year report of ARL Directors Strategic Initiative (FY20-23): artificial Intelligence (AI) for command and control (C2) of multi-domain operations (MDO). Adelphi, MD: US Army Combat Capabilities Development Command, Army Research Laboratory, 2021.

10.

Park

Vindiola

Logie

, et al. Deep reinforcement learning to assist command and control. In: Artificial intelligence and machine learning for multi-domain operations applications IV (Vol. 12113), Orlando, FL, 2022, pp. 430–438. Bellingham, WA: SPIE.

11.

Soleyman

Khosla

. Multi-agent mission planning with reinforcement learning. In: AAAI symposium on the 2nd workshop on deep models and artificial intelligence for defense applications: potentials, theories, practices, tools, and risks, Virtual, 11–12 November 2020, pp. 51–57. Washington, DC: AAAI.

12.

Zhang

Gold

, et al. Air dominance through machine learning. Santa Monica, CA: RAND Corporation, 2020.

13.

Basak

Zaroukian

Corder

, et al. Utility of doctrine with multi-agent RL for military engagements. In: Artificial intelligence and machine learning for multi-domain operations applications IV (Vol. 12113), Orlando, FL, 2022, pp. 609–628. Bellingham, WA: SPIE.

14.

Waytowich

Hare

Goecks

, et al. Learning to guide multiple heterogeneous actors from a single human demonstration via automatic curriculum learning in starcraft II. In: Artificial intelligence and machine learning for multi-domain operations applications IV (Vol. 12113), Orlando, FL, 2022, Bellingham, WA: SPIE.

15.

Vinyals

Ewalds

Bartunov

, et al. StarCraft II: a new challenge for Reinforcement Learning. Arxiv Preprint Arxiv:170804782, 2017.

16.

Marr

JJ.

The military decision making process: making better decisions versus making decisions better. Fort Leavenworth, KS: School of Advanced Military Studies, Us Army Command and General Staff College, 2001.

17.

Shoffner

WA.

The military decision making process: time for a change. Fort Leavenworth, KS: School Of Advanced Military Studies, Us Army Command and General Staff College, 2000.

18.

Goecks

Waytowich

Asher

, et al. On games and simulators as a platform for development of artificial intelligence for command and control. J Def Model Simul 2023; 20(4): 495–508.

19.

Liang

Liaw

Nishihara

, et al. Rllib: abstractions for distributed reinforcement learning. In: International Conference on Machine Learning. PMLR, Stockholm, 10–15 July 2018, pp. 3053–3062. PMLR.

20.

Mnih

Badia

Mirza

, et al. Asynchronous methods for deep reinforcement learning. In: International conference on machine learning. PMLR. New York, 19–24 June 2016, pp. 1928–1937. PMLR.

21.

Schulman

Wolski

Dhariwal

, et al. Proximal policy optimization algorithms. Arxiv Preprint Arxiv:170706347, 2017.

22.

Schulman

Levine

Abbeel

, et al. Trust region policy optimization. In: International conference on machine learning. PMLR. Lille, France, 6–11 July 2015, pp 1889–1897.

23.

Huang

Papernot

Goodfellow

, et al. Adversarial attacks on neural network policies. Arxiv Preprint Arxiv:170202284, 2017.

24.

Sun

Zhang

Xie

, et al. Stealthy and efficient adversarial attacks against deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34), 16–19 October 2024, Pittsburgh, PA, pp. 5883–5891. AAAI.

25.

Guo

Wei

, et al. Adversarial policy training against deep reinforcement learning. In: 30Th USENIX Security Symposium (USENIX Security 21). Vancouver, BC, Canada, 11–13 August 2021, pp. 1883–1900. USENIX Association.

26.

Gleave

Dennis

Wild

, et al. Adversarial policies: attacking deep reinforcement learning. Arxiv Preprint Arxiv:190510615, 2019.

27.

Carlini

Wagner

Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP). San Jose, CA, 22–26 May 2017, pp. 39–57. New York: IEEE.

28.

Goodfellow

Shlens

Szegedy

Explaining and harnessing adversarial examples. Arxiv Preprint Arxiv:14126572, 2014.

29.

Madry

Makelov

Schmidt

, et al. Towards deep learning models resistant to adversarial attacks. In international conference on learning representations, https://openreview.net/forum?id=rJzIBfZAb