Abstract
The multiple pursuers and evaders game may be represented as a Markov game. Using this modeling, one may interpret each player as a decentralized unit that has to work independently in order to complete a task. This is a distributed multiagent decision problem and several different possible solutions have already been proposed. However, most solutions require some sort of central coordination. In this paper, we intend to model each player as a learning automaton and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that, using the proposed learning process, the players’ policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.
Get full access to this article
View all access options for this article.
