Sage Journals: Discover world-class research

Abstract

In this paper, two novel exploration strategies are proposed for n-person general-sum multiagent reinforcement learning with sequential action selection. The existing learning process, called extensive Markov game, is considered as a set of successive extensive form games with perfect information. We introduce an estimated value for taking actions in games with respect to other agents' preferences which is called associative Q-value. They can be used to select actions probabilistically according to Boltzmann distribution. Simulation results present the effectiveness of the proposed exploration strategies that are used in our previously introduced extensive-Q learning methods. Regarding the complexity of existing methods of computing Nash equilibrium points, if it is possible to assume sequential action selection among agents, extensive-Q will be more convenient for dynamic task multiagent systems with more than two agents.

Keywords

Multiagent reinforcement learning Nash equilibrium points exploration-exploitation tradeoff Markov games

Get full access to this article

View all access options for this article.

Exploration strategies in n -Person general-sum multiagent reinforcement learning with sequential action selection

Abstract

Keywords

Get full access to this article