Abstract
Q-value Mixing (QMIX) is a widely used algorithm for multi-agent reinforcement learning. However, multi-agent environments are quite complex and have high-dimensional action and state spaces, which leads to lower exploration efficiency and sparse global reward in the early stage of QMIX. To address the issue, we proposed an efficient hypernetwork parameters optimization method of QMIX based on differential evolution (DE-QMIX). DE-QMIX encodes the hypernetwork parameters of QMIX as population individuals, and obtains the best hypernetwork parameters by performing mutation, crossover, and selection operations on these individuals. The hypernetworks adjust parameters through the gradient descent method and feed the updated parameter information back to the current population to improve the overall efficiency of DE-QMIX. By optimizing the hypernetwork parameters, the joint action-value function
Get full access to this article
View all access options for this article.
