Sage Journals: Discover world-class research

Abstract

This review provides a comprehensive analysis of cooperative Multi-Agent Reinforcement Learning (MARL) approaches for robotic systems, with particular emphasis on methodological foundations, practical implementations, and emerging challenges. We first examine the evolution of distributed intelligence in robotics, tracing its development from early architectures to modern learning-based frameworks. Our analysis focuses on two complementary paradigms: cooperative methods utilizing Centralized Training with Decentralized Execution (CTDE), and hierarchical approaches that address complexity through temporal and task decomposition. We systematically compare actor-critic methods, value-based approaches, and hierarchical frameworks across theoretical foundations, implementation characteristics, and application domains spanning aerial, ground, and maritime robotics. Our comparative analysis reveals important trade-offs between expressiveness, computational efficiency, and implementation complexity, highlighting that method selection must align with specific application requirements. Furthermore, we identify critical challenges, including the sim-to-real gap, scalability constraints, communication limitations, safety verification, and coordination in heterogeneous teams, mapping promising research directions to address these barriers to widespread deployment. This survey bridges theoretical understanding with practical implementation, providing a structured framework for researchers and practitioners working on multi-agent learning for advanced robotic systems.

Keywords

multi-agent reinforcement learning (MARL)cooperative robotics centralized training with decentralized execution (CTDE)hierarchical reinforcement learning (HRL)multi-robotics robot coordination

Get full access to this article

View all access options for this article.

References

Bond

Gasser

. Chapter 1 - an analysis of problems and research in DAI. In: Readings in distributed artificial intelligence, 1988, pp.3–35. Morgan Kaufmann.

Decker

. Distributed problem-solving techniques: a survey. IEEE Trans Syst Man Cybern 1987; 17: 729–740.

Stone

Veloso

. Multiagent systems: a survey from a machine learning perspective. Auton Robots 2000; 8: 345–383.

Parker

. Alliance: an architecture for fault tolerant multirobot cooperation. IEEE Trans Rob Autom 1998; 14: 220–240.

Dudek

Jenkin

Milios

, et al. A taxonomy for multi-agent robotics. Auton Robots 1996; 3: 375–397.

Busoniu

Babuska

De Schutter

. A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C Appl Rev 2008; 38: 156–172.

Weiss

. Multiagent systems. 2nd ed. Cambridge MA: MIT Press, 2013.

Albrecht

Christianos

Schäfer

. Multi-agent reinforcement learning: foundations and modern approaches. Cambridge MA: MIT Press, 2024.

Zhou

van Kampen

Chu

. Hybrid hierarchical reinforcement learning for online guidance and navigation with partial observability. Neurocomputing 2019; 331: 443–457.

10.

Zhou

. Online robot guidance and navigation in non-stationary environment with hybrid hierarchical reinforcement learning. Eng Appl Artif Intell 2022; 114: 105152.

11.

Pham

Feil-Seifer

, et al. A distributed control framework of multiple unmanned aerial vehicles for dynamic wildfire tracking. IEEE Trans Syst Man Cybern Syst 2020; 50: 1537–1548.

12.

Kaufmann

Bauersfeld

Loquercio

, et al. Champion-level drone racing using deep reinforcement learning. Nature 2023; 620: 982–987.

13.

Kong

Zhou

, et al. Hierarchical multi-agent reinforcement learning for multi-aircraft close-range air combat. IET Control Theory Appl 2023; 17: 1840–1862.

14.

Sherman

Bezzo

. A heterogeneous system of systems framework for proactive path planning of a UAV-assisted UGV in uncertain environments. In: 2024 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp.13237–13244. IEEE.

15.

Krnjaic

Steleac

Thomas

, et al. Scalable multi-agent reinforcement learning for warehouse logistics with robotic and human co-workers. In: 2024 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp.677–684. IEEE.

16.

Nguyen

Nahavandi

. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern 2020; 50: 3826–3839.

17.

Hernandez-Leal

Kartal

Taylor

. A survey and critique of multiagent deep reinforcement learning. Auton Agent Multi Agent Syst 2019; 33: 750–797.

18.

Ellis

Cook

Moalla

, et al. SMACv2: an improved benchmark for cooperative multi-agent reinforcement learning. Adv Neural Inf Process Syst 2023; 36: 37567–37593.

19.

Sutton

Barto

. Reinforcement learning: an introduction. Cambridge, MA, USA: MIT Press, 2018.

20.

Lowe

Tamar

, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems. NIPS’17, Red Hook, NY, USA, pp.6382–6393.

21.

Zhang

Yang

Başar

. Multi-agent reinforcement learning: a selective overview of theories and algorithms. Handb Reinf Learn Control 2021; 18: 321–384.

22.

Nachum

Tang

, et al. Why does hierarchy (sometimes) work so well in reinforcement learning? arXiv preprint arXiv:190910618, 2019.

23.

Liu

, et al. Neuronsmae: a novel multi-agent reinforcement learning environment for cooperative and competitive multi-robot tasks. In: 2023 International joint conference on neural networks (IJCNN), pp.1–8.

24.

Zhang

Lin

. Efficient communication in multi-agent reinforcement learning via variance based control. Adv Neural Inf Process Syst 2019; 32: 3212–3221.

25.

Littman

. Markov games as a framework for multi-agent reinforcement learning. In: Machine learning proceedings 1994, 1994, pp.157–163. Elsevier.

26.

Malysheva

Sung

Sohn

, et al. Deep multi-agent reinforcement learning with relevance graphs. arXiv preprint arXiv:181112557, 2018.

27.

Oliehoek

Amato

Oliehoek

, et al. The decentralized POMDP framework. A concise introduction to decentralized POMDPs, 2016, pp.11–32.

28.

Amato

. A first introduction to cooperative multi-agent reinforcement learning. arXiv preprint arXiv:240506161, 2024.

29.

Ding

. A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artif Intell Rev 2021; 54: 3215–3238.

30.

Amato

. An introduction to centralized training for decentralized execution in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:240903052, 2024.

31.

Bernstein

Givan

Immerman

, et al. The complexity of decentralized control of Markov decision processes. Math Oper Res 2002; 27: 819–840.

32.

Rashid

Samvelyan

De Witt

, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning. J Mach Learn Res 2020; 21: 1–51.

33.

Velu

Vinitsky

, et al. The surprising effectiveness of PPO in cooperative multi-agent games. Adv Neural Inf Process Syst 2022; 35: 24611–24624.

34.

Lillicrap

Hunt

Pritzel

, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971, 2015.

35.

Zhang

Mou

Gao

, et al. UAV-enabled secure communications by multi-agent deep reinforcement learning. IEEE Trans Veh Technol 2020; 69: 11599–11611.

36.

Zhang

Zong

Zhang

, et al. Game of drones: multi-UAV pursuit-evasion game with online motion planning by deep reinforcement learning. IEEE Trans Neural Netw Learn Syst 2023; 34: 7900–7909.

37.

Peng

Shen

. Multi-agent reinforcement learning based resource management in MEC- and UAV-assisted vehicular networks. IEEE J Sel Areas Commun 2021; 39: 131–141.

38.

Wang

Pan

, et al. Multi-agent deep reinforcement learning-based trajectory planning for multi-UAV assisted mobile edge computing. IEEE Trans Cognit Commun Networking 2021; 7: 73–84.

39.

Lin

Ren

. Distributed multirobot path planning based on MRDWA-MADDPG. IEEE Sens J 2023; 23: 25420–25432.

40.

Wang

Zhao

. Multiple ships cooperative navigation and collision avoidance using multi-agent reinforcement learning with communication. Ocean Eng 2025; 320: 120244.

41.

Gao

, et al. Scalable path planning algorithm for multi-unmanned surface vehicles based on multi-agent deep deterministic policy gradient. Ocean Eng 2025; 320: 120243.

42.

Schulman

Wolski

Dhariwal

, et al. Proximal policy optimization algorithms. arXiv preprint arXiv:170706347, 2017.

43.

Wang

Jiang

, et al. Task scheduling for distributed AUV network target hunting and searching: an energy-efficient AoI-aware DMAPPO approach. IEEE Internet Things J 2023; 10: 8271–8285.

44.

Jiang

, et al. Underwater searching and multiround data collection via AUV swarms: an energy-efficient AoI-aware MAPPO approach. IEEE Internet Things J 2024; 11: 12768–12782.

45.

Wang

Gao

, et al. A velocity-domain MAPPO approach for perimeter defensive confrontation by USV groups. Expert Syst Appl 2025; 265: 125980.

46.

Kang

Chang

Mišić

, et al. Cooperative UAV resource allocation and task offloading in hierarchical aerial computing systems: a MAPPO-based approach. IEEE Internet Things J 2023; 10: 10497–10509.

47.

Guan

Zou

Peng

, et al. Cooperative UAV trajectory design for disaster area emergency communications: a multiagent ppo method. IEEE Internet Things J 2024; 11: 8848–8859.

48.

Qin

Zhang

, et al. DRL-based resource allocation and trajectory planning for noma-enabled multi-UAV collaborative caching 6G network. IEEE Trans Veh Technol 2024; 73: 8750–8764.

49.

Sunehag

Lever

Gruslys

, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, pp.2085–2087.

50.

Son

Kim

Kang

, et al. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International conference on machine learning (ICML), pp.5887–5896.

51.

Wang

Ren

Liu

, et al. QPLEX: duplex dueling multi-agent q-learning. arXiv preprint arXiv:200801062, 2020.

52.

Guo

Wang

, et al. Coordination for connected and automated vehicles at non-signalized intersections: a value decomposition-based multiagent deep reinforcement learning approach. IEEE Trans Veh Technol 2023; 72: 3025–3034.

53.

Guo

Wang

, et al. Heuristic-based multi-agent deep reinforcement learning approach for coordinating connected and automated vehicles at non-signalized intersection. IEEE Trans Intell Transp Syst 2024; 25: 16235–16248.

54.

Lin

Kuang

, et al. Satellite-terrestrial coordinated multi-satellite beam hopping scheduling based on multi-agent deep reinforcement learning. IEEE Trans Wireless Commun 2024; 23: 10091–10103.

55.

Xiao

Wang

, et al. Stochastic graph neural network-based value decomposition for MARL in internet of vehicles. IEEE Trans Veh Technol 2024; 73: 1582–1596.

56.

Yang

Borovikov

Zha

. Hierarchical cooperative multi-agent reinforcement learning with skill discovery. arXiv preprint arXiv:191203558, 2019.

57.

Ghavamzadeh

Mahadevan

Makar

. Hierarchical multi-agent reinforcement learning. Auton Agent Multi Agent Syst 2006; 13: 197–229.

58.

Liu

Chen

, et al. Hierarchical reinforcement learning for swarm confrontation with high uncertainty. IEEE Trans Autom Sci Eng 2025; 22: 8630–8644.

59.

Scheiderer

Mosbach

Posada-Moreno

, et al. Transfer of hierarchical reinforcement learning structures for robotic manipulation tasks. In: 2020 International conference on computational science and computational intelligence (CSCI), pp.504–509. IEEE.

60.

Liu

Cao

Chen

, et al. A hierarchical reinforcement learning algorithm based on attention mechanism for UAV autonomous navigation. IEEE Trans Intell Transp Syst 2022; 24: 13309–13320.

61.

Wöhlke

Schmitt

van Hoof

. Hierarchies of planning and reinforcement learning for robot navigation. In: 2021 IEEE international conference on robotics and automation (ICRA), pp.10682–10688. IEEE.

62.

Dietterich

. Hierarchical reinforcement learning with the MAXQ value function decomposition. J Artif Intell Res 2000; 13: 227–303.

63.

Yan

Wang

Chen

. Collaborative path planning based on MAXQ hierarchical reinforcement learning for manned/unmanned aerial vehicles. In: 2020 39th Chinese control conference (CCC), pp.4837–4842.

64.

Doroodgar

Liu

Nejat

. A learning-based semi-autonomous controller for robotic exploration of unknown disaster scenes while searching for victims. IEEE Trans Cybern 2014; 44: 2719–2732.

65.

Sutton

Precup

Singh

. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif Intell 1999; 112: 181–211.

66.

Ramesh

Tomar

Ravindran

. Successor options: an option discovery framework for reinforcement learning. arXiv preprint arXiv:190505731, 2019.

67.

Zhang

Mou

Gao

, et al. Hierarchical deep reinforcement learning for backscattering data collection with multiple UAVs. IEEE Internet Things J 2021; 8: 3786–3800.

68.

Araki

Vodrahalli

, et al. The logical options framework. In: International conference on machine learning (ICML), pp.307–317.

69.

Vezhnevets

Osindero

Schaul

, et al. Feudal networks for hierarchical reinforcement learning. In: International conference on machine learning (ICML), pp.3540–3549.

70.

Liu

Tan

. Feudal latent space exploration for coordinated multi-agent reinforcement learning. IEEE Trans Neural Netw Learn Syst 2023; 34: 7775–7783.

71.

Johnson

Cao

Ashok

, et al. Feudal networks for visual navigation. arXiv preprint arXiv:240212498, 2024.

72.

Ahilan

Dayan

. Feudal multi-agent hierarchies for cooperative reinforcement learning. arXiv preprint arXiv:190108492, 2019.

73.

Levy

Konidaris

Platt

, et al. Learning multi-level hierarchies with hindsight. arXiv preprint arXiv:171200948, 2017.

74.

Yang

, et al. Hierarchical reinforcement learning with universal policies for multistep robotic manipulation. IEEE Trans Neural Netw Learn Syst 2022; 33: 4727–4741.

75.

Scheiderer

Mosbach

Posada-Moreno

76.

Sun

Huang

Pompili

. HMAAC: hierarchical multi-agent actor-critic for aerial search with explicit coordination modeling. In: 2023 IEEE international conference on robotics and automation (ICRA), pp.7728–7734.

77.

Nachum

Lee

, et al. Data-efficient hierarchical reinforcement learning. Adv Neural Inf Process Syst 2018; 31: 3303–3313.

78.

Schmoeller Roza

Rasheed

Roscher

, et al. Safe robot navigation using constrained hierarchical reinforcement learning. In: 2022 21st IEEE international conference on machine learning and applications (ICMLA), pp.737–742.

79.

Hou

Fei

Deng

, et al. Data-efficient hierarchical reinforcement learning for robotic assembly control applications. IEEE Trans Ind Electron 2021; 68: 11565–11575.

80.

Parr

Russell

. Reinforcement learning with hierarchies of machines. Adv Neural Inf Process Syst 1997; 10: 1043–1049.

81.

Panov

Skrynnik

. Automatic formation of the structure of abstract machines in hierarchical reinforcement learning with state clustering. arXiv preprint arXiv:180605292, 2018.

82.

Bacon

Harb

Precup

. The option-critic architecture. In: Proceedings of the AAAI conference on artificial intelligence, volume 31.

83.

Khetarpal

Klissarov

Chevalier-Boisvert

, et al. Options of interest: temporal abstraction with interest functions. In: Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.4444–4451.

84.

Chakravorty

Ward

Roy

, et al. Option-critic in cooperative multi-agent systems. arXiv preprint arXiv:191112825, 2019.

85.

Chunduru

Precup

. Attention option-critic. arXiv preprint arXiv:220102628, 2022.

86.

Findik

Hasenfus

Azadeh

. Collaborative adaptation for recovery from unforeseen malfunctions in discrete and continuous marl domains. arXiv preprint arXiv:240719144, 2024.

87.

Zhao

Huo

, et al. Graph-based multi-agent reinforcement learning for large-scale UAVs swarm system control. Aerosp Sci Technol 2024; 150: 109166.

88.

Qin

Zhang

Takács

, et al. Nonholonomic dynamics and control of road vehicles: moving toward automation. Nonlinear Dyn 2022; 110: 1959–2004.

89.

Yang

Huang

Sun

, et al. Learning graph-enhanced commander-executor for multi-agent navigation. arXiv preprint arXiv:230204094, 2023.

90.

Gupta

Jain

Vaszkun

. Survey of important issues in UAV communication networks. IEEE Commun Surv Tutorials 2015; 18: 1123–1152.

91.

Alam

Arafat

Moh

, et al. Topology control algorithms in multi-unmanned aerial vehicle networks: an extensive survey. J Network Comput Appl 2022; 207: 103495.

92.

Gao

Wang

Liang

, et al. Game combined multi-agent reinforcement learning approach for UAV assisted offloading. IEEE Trans Veh Technol 2021; 70: 12888–12901.

93.

Zhou

. Efficient online globalized dual heuristic programming with an associated dual network. IEEE Trans Neural Netw Learn Syst 2023; 34: 10079–10090.

94.

Zhao

Queralta

Westerlund

. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE symposium series on computational intelligence (SSCI), pp.737–744. IEEE.

95.

Juan

Gomez

, et al. Transferring policy of deep reinforcement learning from simulation to reality for robotics. Nat Mach Intell 2022; 4: 1077–1087.

96.

Kober

Bagnell

Peters

. Reinforcement learning in robotics: a survey. Int J Rob Res 2013; 32: 1238–1274.

97.

Gronauer

Diepold

. Multi-agent deep reinforcement learning: a survey. Artif Intell Rev 2022; 55: 895–943.

98.

Banerjee

Nguyen

Fookes

, et al. A survey on physics informed reinforcement learning: review and open problems. arXiv preprint arXiv:230901909, 2023.

99.

Cao

Zhao

Wang

, et al. Deep reinforcement learning-based large-scale robot exploration. IEEE Rob Autom Lett 2024; 9: 4631–4638.

100.

Buşoniu

Babuška

De Schutter

. Multi-agent reinforcement learning: an overview. Innovations Multi-Agent Syst Appl-1 2010; 310: 183–221.

101.

Xia

, et al. An overview: attention mechanisms in multi-agent reinforcement learning. Neurocomputing 2024; 598: 128015.

102.

Shaw

Wenzel

Walker

, et al. Formic: foraging via multiagent RL with implicit communication. IEEE Rob Autom Lett 2022; 7: 4877–4884.

103.

Wang

Bhatt

Wang

, et al. Cross-embodiment robot manipulation skill transfer using latent space alignment. arXiv preprint arXiv:240601968, 2024.

104.

Kuba

Chen

, et al. Safe multi-agent reinforcement learning for multi-robot control. Artif Intell 2023; 319: 103905.

105.

Garg

Zhang

, et al. Learning safe control for multi-robot systems: methods, verification, and open challenges. Annu Rev Control 2024; 57: 100948.

106.

Müller

. Ethics of artificial intelligence and robotics. In: Zalta EN and Nodelman U (eds) The Stanford encyclopedia of philosophy, Winter 2023 ed. Metaphysics Research Lab, Stanford University, 2023.

107.

Westerlund

. An ethical framework for smart robots. Technol Innovation Manage Rev 2020; 10: 35–44.

108.

Leenes

Palmerini

Koops

, et al. Regulatory challenges of robotics: some guidelines for addressing legal and ethical issues. Law Innovation Technol 2017; 9: 1–44.

109.

Bennett

Raab

. The governance of privacy: policy instruments in global perspective. Cambridge MA: MIT Press, 2003.

Cooperative multi-agent reinforcement learning for robotic systems: A review

Abstract

Keywords

Get full access to this article

References