Continuous-time path planning for multi-agents with fuzzy reinforcement learning

Abstract

There are a lot of applications of multi-agent systems, such as robot navigation, distributed control, data mining, etc. Reinforcement learning (RL) is a popular method used in multi agent path planning. RL algorithm needs an accurate representation of a small and discrete space. In order to plan multi agents in continuous time, this paper approximate the Q-values with the fuzzy logic, such that the modified RL can work in continuous state space. The fuzzy reinforcement learning proposed in this paper uses fuzzy Q-iteration algorithm and a modified Wolf-PH algorithm. The convergence and existence of the algorithm are proven. The continuous time planning algorithm is applied to a cooperative task of two mobile Khepera robots. The experimental results show the effectiveness of the new path planning method for the multi agents in continuous time.

Keywords

Fuzzy reinforcement learning multi agents path planning

Get full access to this article

View all access options for this article.

References

Sen

and Weiss

, Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, MIT PressCambridge.

Stone

and Veloso

, Multiagent systems: A survey from machine learning perspective, Autonomus Robots8(3) (2000), 345–383.

Wooldridge An Introduction to MultiAgent Systems, John Wiley & Sons, 2002.

Kaelbling

L.P.

, Littman

M.L.

and Moore

A.W.

, Reinforcement learning: A survey, Journal of Artificial Intelligence Research4(1) (1996), 237–285.

Arel

, Liu

, Urbanik

and Kohls

A.G.

, Reinforcement learning-based multi-agent system for network traffic signal control, IET Intelligent Transport Systems4(2) (2010), 128–135.

Cherkassky

and Mulier

, Learning from data: Concepts, Theory and Methods, Wiley-IEEE Press, Chichester, 1998.

Sejnowski

T.J.

and Hinton

, Unsupervised Learning: Foundations of Neural Computation, MIT Press, 1999.

and Wang

, Consensus of linear multi-agent systems subject to actuator saturation, International Journal of Control, Automation, and Systems11(4) (2013), 649–656.

Cruz

D.L.

and Yu

, Multi-Agent Path Planning in Unknown Environment with Reinforcement Learning and Neural Network, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC14), San Diego, USA, 2014, pp. 3469–3474.

10.

Abul

, Polat

and Alhajj

, Multi-agent reinforcement learning using function approximation, IEEE transactions on Systems, Man and Cybernetics Part C: Applications and Reviews (2000), 485–497.

11.

Fernandez

and Parker

L.E.

, Learning in large cooperative multirobots systems, International Journal of Robotics and Automatization, Special Issue on Computational Intelligence Techniques in Cooperative Robots16(4) (2001), 217–226.

12.

Tamakoshi

and Ishi

, Multi agent reinforcement learning applied to a chase problem in a continuos world, Artifitial Life and Robotics5(4) (2001), 202–206.

13.

Ishiwaka

, Sato

and Kakazu

, An approach to persuit problem on a heterogeneous multiagent system using reinforcement learning, Robotics and Autonomous Systems43(4) (2003), 245–256.

14.

Bowling

and Veloso

, Multiagent learning using a variable learning rate, Artificial Intelligence136(2) (2002), 215–250.

15.

Fathinezhad

, Derhami

and Rezaeian

, Supervised fuzzy reinforcement learning for robot navigation, Applied Soft Computing40 (2016), 33–41.

16.

Boutilier

, Planning, Learning and Coordination in Multiagent Decision Processes, In Proceedings of the Sixth Conference on Theoretical Aspects of Rationality and Knowledge (TARK96), 1996, pp. 195–2102.

17.

Harsanyi

J.C.

and Selten

, A General Theory of Equilibrium Selection in Games, MIT Press, Cambridge, 1988.

18.

Busoniu

, Babuska

and De Schutter

, Multi-agent Reinforcement Learning: An Overview, Innovation in MASs and Applications. SCI 310, Springer VerlagBerlin Heidelberg, pp. 183–221.

19.

Basar

and Olsder

G.J.

, Dynamic Noncooperative Game Theory, 2nd edition. Society for Andustrial and Applied Mathematics, SIAM, 1999.

20.

Busoniu

, De Schutter

and Babuska

, Decentralized Reinforcement Learning Control of a robotic Manipulator, International Conference on Control, Automation, Robotics and Vision, 2006, I CARCV ’06. 9th.

21.

Bertsekas

D.P.

, Dynamic Programming and optimal control vol 2, third edition, Athena Scientific.

22.

Istratesku

, Fixed Point Theory: An introduction Springer, 2002.

23.

Melo

F.S.

, Meyn

S.P.

and Ribeiro

M.I.

, An analysis of reinforcement learning with functions approximation, Proceedings 25th International Conference on Machine Learning (ICML-08), Helsinky, Finland, 2008, pp. 664–671.

24.

Szepesvari

Cs.

and Smart

W.D.

, Interpolation baes Q-learning, Procedings 21st International Conference on Machine Learning (ICML-04), Bannf, Canada, pp. 791–798.

25.

Sutton

R.S.

, McAllester

D.A.

, Singh

S.P.

and Mansour

, Policy gradient methods for reinforcement learning with function approximation, Advances in Neural Information Processing Systems 12, MIT Press, 2000, pp. 1057–1063.

26.

Bertsekas

D.P.

and Tsitsiklis

J.N.

, Neuro-dynamic programming, Athena Scientific, 1996.

27.

Tsitsiklis

J.N.

and Van Roy

, Feature-based methods for large scale dynamic programming, Machine Learning22(1– 3) (1996), 59–94.

28.

Mamdani

, Application of fuzzy logic to approximate reasoning using linguistic systems, IEEE Transactions on Computers26 (1977), 1182–1191.

29.

Kruse

, Gebhardt

J.E.

and Klowon

, Foundations of Fuzzy Systems, Wiley, 1994.

30.

Gordon

G.J.

, Reinforcement learning with function approximation converges to a region. In Leen

T.K.

, Dietterich

T.G.

and Tresp

, editors, Advances in Neural Information Processing Systems 13, MIT Press, 2001, pp. 1040–1046.

31.

Tsitsiklis

J.N.

, Asynchronous stochastic approximation and Qlearning, Machine Learning16(1) (1994), 185–202.

32.

Berenji

H.R.

and Khedkar

, Learning and tuning fuzzy logic controllers through reinforcements, IEEE Transactions on Neural Networks3(5) (1992), 724–740.

33.

Munos

and Moore

, Variable-resolution discretization in optimal control, Machine Learning49(2– 3) (2002), 291–323.

34.

Chow

C.-S.

and Tsitsiklis

J.N.

, An optimal one-way multigrid algorithm for discrete-time stochastic control, IEEE Transactions on Automatic Control36(8) (1991), 898–914.

35.

Busoniu

, Ernst

, De Schutter

and Babuska

, Approximate dynamic programming with fuzzy parametization, Automatica46 (2010), 804–814.

36.

K-team Corporation, 2013http://www-k-team.com

37.

Ganapathy

, Yun

S.C.

and Lui

W.L.D.

, Utilization of webots and Khepera II as a Platform for neural Q-learning controllers, IEEE Symposium on Industrial Electronics and Applications (ISIEA 2009), Kuala Lumpur, Malaysia.

38.

Vlassis

, A concise Introduction to Multi Agent Systems amd Distributed Artificial Intelligence. Synthesis Lectures in Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2007.

39.

Busoniu

, Ernst

, Schutter

and Babuska

, Continuous-State Reinforcement Learning with Fuzzy Approximation, Adaptive Agents and MAS III, Springer-VerlagBerlin Heidelberg, Tuyls

et al. (Eds.), LNAI 48652008, pp. 27–43.