Decentralized strategy selection with learning automata for multiple pursuer

Abstract

The multiple pursuers and evaders game may be represented as a Markov game. Using this modeling, one may interpret each player as a decentralized unit that has to work independently in order to complete a task. This is a distributed multiagent decision problem and several different possible solutions have already been proposed. However, most solutions require some sort of central coordination. In this paper, we intend to model each player as a learning automaton and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that, using the proposed learning process, the players’ policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.

Keywords

Learning pursuer–evader games intelligent systems reinforcement learning learning automata

Get full access to this article

View all access options for this article.

References

Basar

Olsder

G. J.

(1999). Dynamic noncooperative game theory. (2nd edn.) Philadelphia, PA: SIAM.

Bertsekas

D. P.

(1995). Dynamic programming and optimal control. Belmont, MA: Athena Scientific.

Conlisk

(1993). Adaptation in games: Two solutions to the Crawford puzzle. Journal of Economic Behaviour and Organization, 22, 25–50.

Derman

(1970). Finite state Markovian decision processes. New York, NY: Academic Press.

Filar

Vrieze

(1997). Competitive Markov decision processes. New York, NY: Springer-Verlag.

Fudenberg

Levine

D. K.

(1998). The theory of learning in games. Cambridge, MA: MIT Press.

Givigi

S. N.

(2009). Analysis and design of swarm-based robots using game theory. Unpublished doctoral dissertation, Carleton University, ON, Canada.

Givigi

S. N.

Schwartz

H. M.

(2010). A reinforcement learning adaptive fuzzy controller for differential games. Journal of Intelligent and Robotic Systems, 59, 3–30.

Harmon

M. E.

Baird

L. C.

III Klopf

A. H.

(1995). Reinforcement learning applied to a differential game. Adaptive Behavior, 4(1), 3–28.

10.

Hespanha

J. P.

Prandini

(2001). Nash equilibria in partial-information games on Markov chains. In 40th IEEE Conference on Decision and Control.

11.

Hespanha

J. P.

Prandini

Sastry

(2000). Probabilistic pursuit-evasion games: A one-step Nash approach. In 39th IEEE Conference on Decision and Control.

12.

Hofbauer

Sigmund

(2003). Evolutionary game dynamics. Bulletin of the American Mathematical Society, 40(4), 479–519.

13.

Howard

R. A.

(1960). Dynamic programming and Markov processes. Cambridge, MA: MIT Press.

14.

Isaacs

(1965). Differential games: A mathematical theory with applications to warfare and pursuit, control and optimization. New York, NY: John Wiley and Sons, Inc.

15.

Kushner

H. J.

Yin

G. G.

(1997). Stochastic approximation algorithms and applications. New York, NY: Springer-Verlag.

16.

Cruz

J. B.

(2006). Improvement with look-ahead on cooperative pursuit games. In 45th IEEE Conference on Decision and Control.

17.

Littman

M. L.

(2001). Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research, 2, 55–66.

18.

Myerson

R. B.

(1991). Game theory: Analysis of conflict. Cambridge, MA: Harvard University Press.

19.

Nash

J. F.

(1951). Noncooperative games. Annals of Mathematics, 54(2), 289–295.

20.

Norman

M. F.

(1972). Markov processes and learning models. New York, NY: Academic Press.

21.

Posnyak

A. S.

Najim

(1997). Learning automata and stochastic optimization. New York, NY: Springer.

22.

Sastry

P. S.

Phansalkar

V. V.

Thathachar

M. A. L.

(1994). Decentralized learning of Nash equilibria in multi-person stochastic games with incomplete information. IEEE Transactions on Systems, Man, and Cybernetics, 24(5), 769–777.

23.

Shapley

L. S.

(1953). Stochastic games. Proceedings of the National Academy of Sciences of the United States of America, 39, 1095–1100.

24.

Starr

A. W.

Y. C.

(1969). Nonzero-sum differential games. Journal of Optimization Theory and Applications, 3(3), 184–206.

25.

Suppes

Atkinson

R. C.

(1960). Markov learning models for multiperson interactions. Stanford, CA: Stanford University Press.

26.

Thathachar

M. A. L.

Sastry

P. S.

(2004). Networks of learning automata: Techniques for online stochastic optimization. Boston, MA: Kluwer Academic Publishers.

27.

Van der Wal

(1980). Stochastic dynamic programming. Unpublished doctoral dissertation, Technische Hogeschool Eindhoven, The Netherlands.

28.

Von Neumann

Morgenstern

(1947). The theory of games and economic behavior. (2nd edn.) Princeton, NJ: Princeton University Press.

29.

Vrancx

Verbeeck

Nowe

(2008). Decentralized learning in Markov games. IEEE Transactions on Systems, Man, and Cybernetics – Part B, 38(4), 976–981.

30.

Weibull

J. W.

(1995). Evolutionary game theory. Cambridge, MA: MIT Press.

31.

Wheeler

R. M.

Narendra

K. S.

(1986). Decentralized learning in finite Markov chains. IEEE Transactions on Automatic Control, 31(6), 519–526.

32.

Yeung

D. W. K.

Petrosyan

L. A.

(2006). Cooperative stochastic differential games. New York, NY: Springer Science+Business Media, Inc.

Decentralized strategy selection with learning automata for multiple pursuer–evader games

Abstract

Keywords

Get full access to this article

References