Sage Journals: Discover world-class research

Abstract

Q-value Mixing (QMIX) is a widely used algorithm for multi-agent reinforcement learning. However, multi-agent environments are quite complex and have high-dimensional action and state spaces, which leads to lower exploration efficiency and sparse global reward in the early stage of QMIX. To address the issue, we proposed an efficient hypernetwork parameters optimization method of QMIX based on differential evolution (DE-QMIX). DE-QMIX encodes the hypernetwork parameters of QMIX as population individuals, and obtains the best hypernetwork parameters by performing mutation, crossover, and selection operations on these individuals. The hypernetworks adjust parameters through the gradient descent method and feed the updated parameter information back to the current population to improve the overall efficiency of DE-QMIX. By optimizing the hypernetwork parameters, the joint action-value function $Q_{t o t a l}$ fitted from the mixing network can more accurately reflect the decision quality of the entire multi-agent system, which can guide the individual agent to reduce invalid or inefficient action selection during exploration and speed up the learning process of agents. The improvement of the $Q_{t o t a l}$ will guide the agent to choose better actions and improve the global reward. Our experiments on the StarCraft Multi-Agent Challenge (SMAC) platform have demonstrated that DE-QMIX achieves a higher average winning rate and global reward than QMIX and other existing approaches such as Multi-Agent Variational Exploration (MAVEN), Value-Decomposition Networks (VDN), and Joint Q-Function Transformation (QTRAN).

Keywords

Multi-agent reinforcement learning differential evolution parameter optimization

Get full access to this article

View all access options for this article.

References

Oroojlooy

Hajinezhad

. A review of cooperative multi-agent deep reinforcement learning. Appl Intell 2023; 53: 13677–13722.

Xia

Wang

, et al. Multi-agent reinforcement learning aided intelligent UAV swarm for target tracking. IEEE Trans Veh Technol 2021; 71: 931–945.

Gao

Wang

Liang

, et al. Game combined multi-agent reinforcement learning approach for UAV assisted offloading. IEEE Trans Veh Technol 2021; 70: 12888–12901.

Chen

. Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning. Aeronaut J 2022; 126: 932–951.

Dhiman

Kumar

Nirmalan

, et al. Multi-modal active learning with deep reinforcement learning for target feature extraction in multi-media image processing applications. Multimed Tools Appl 2023; 82: 5343–5367.

Wang

Zhang

Lin

, et al. Solving job scheduling problems in a resource preemption environment with multi-agent reinforcement learning. Robot Comput Integr Manuf 2022; 77: 102324.

Zhou

Tang

Zhu

, et al. Multi-agent reinforcement learning for online scheduling in smart factories. Robot Comput Integr Manuf 2021; 72: 102202.

Wang

Zhang

Liu

, et al. Dynamic scheduling of tasks in cloud manufacturing with multi-agent reinforcement learning. J Manuf Syst 2022; 65: 130–145.

Antonio

Maria-Dolores

. Multi-agent deep reinforcement learning to manage connected autonomous vehicles at tomorrow’s intersections. IEEE Trans Veh Technol 2022; 71: 7033–7043.

10.

Bhalla

Ganapathi Subramanian

Crowley

. Deep multi agent reinforcement learning for autonomous driving. In: Canadian Conference on Artificial Intelligence, 2020, May, pp. 67–78. Cham: Springer International Publishing.

11.

Zhou

Chen

Yan

, et al. Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic. Auton Intell Syst 2022; 2: 5.

12.

Lowe

Tamar

, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 2017; 30: 6382–6393.

13.

Sunehag

Lever

Gruslys

, et al. Value-decomposition networks for cooperative multi-agent learning. 2017. arXiv preprint arXiv:1706.05296.

14.

Rashid

Samvelyan

De Witt

, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning. J Mach Learn Res 2020; 21: 1–51.

15.

Samvelyan

Rashid

De Witt

, et al. The starcraft multi-agent challenge. 2019. arXiv preprint arXiv:1902.04043.

16.

Son

. Learning to factorize with regularization for cooperative multi-agent reinforcement learning. 2019.

17.

Rashid

Samvelyan

De Witt

, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning. J Mach Learn Res 2020; 21: 1–51.

18.

Wang

Ren

Liu

, et al. Qplex: Duplex dueling multi-agent q-learning. 2020. arXiv preprint arXiv: 2008.01062.

19.

Darwish

Ezzat

Hassanien

. An optimized model based on convolutional neural networks and orthogonal learning particle swarm optimization algorithm for plant diseases diagnosis. Swarm Evol Comput 2020; 52: 100616.

20.

, et al. Research on particle swarm optimization in LSTM neural networks for rainfall-runoff simulation. J Hydrol 2022; 608: 127553.

21.

Min

Song

Chen

, et al. Genetic algorithm optimized neural network based fuel cell hybrid electric vehicle energy management strategy under start-stop condition. Appl Energy 2022; 306: 118036.

22.

Chung

Shin

. Genetic algorithm-optimized multi-channel convolutional neural network for stock market prediction. Neural Comput Appl 2020; 32: 7897–7914.

23.

Sehgal

Louis

, et al. Deep reinforcement learning using genetic algorithm for parameter optimization. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), 2019, February, pp. 596–601. IEEE.

24.

Rahaman

Sil

. DE Based Q-learning algorithm to improve speed of convergence in large search space applications. In: 2014 International Conference on Electronic Systems, Signal Processing and Computing Technologies, 2014, January, pp. 408–412. IEEE.

25.

Foerster

Farquhar

Afouras

, et al. Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI conference on artificial intelligence, 2018, April, vol. 32, no. .

26.

Velu

Vinitsky

, et al. The surprising effectiveness of ppo in cooperative multi-agent games. Adv Neural Inf Process Syst 2022; 35: 24611–24624.

27.

Mahajan

Rashid

Samvelyan

, et al. Maven: Multi-agent variational exploration. Adv Neural Inf Process Syst 2019; 32: 7613–7624.

28.

Storn

Price

. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob Opt 1997; 11: 341–359.

29.

Eberhart

Kennedy

. A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the sixth international symposium on micro machine and human science, 1995, October, pp. 39–43. IEEE.

30.

Holland

. Genetic algorithms. Sci Am 1992; 267: 66–73.

31.

Mallipeddi

Suganthan

Pan

, et al. Differential evolution algorithm with ensemble of parameters and mutation strategies. Appl Soft Comput 2011; 11: 1679–1696.

32.

Hausknecht

Stone

. Deep recurrent Q-learning for partially observable MDPs. In: AAAI fall symposia, 2015, November, vol. 45, p. 141.

33.

Rashid

Farquhar

Peng

, et al. Weighted qmix: expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Adv Neural Inf Process Syst 2020; 33: 10199–10210.

Hypernetwork parameter optimization of QMIX based on differential evolution

Abstract

Keywords

Get full access to this article

References