Abstract
In this paper, two novel exploration strategies are proposed for n-person general-sum multiagent reinforcement learning with sequential action selection. The existing learning process, called extensive Markov game, is considered as a set of successive extensive form games with perfect information. We introduce an estimated value for taking actions in games with respect to other agents' preferences which is called associative Q-value. They can be used to select actions probabilistically according to Boltzmann distribution. Simulation results present the effectiveness of the proposed exploration strategies that are used in our previously introduced extensive-Q learning methods. Regarding the complexity of existing methods of computing Nash equilibrium points, if it is possible to assume sequential action selection among agents, extensive-Q will be more convenient for dynamic task multiagent systems with more than two agents.
Keywords
Get full access to this article
View all access options for this article.
