Goal Directed Adaptive Behavior in Second-Order Neural Networks: The MAXSON family of architectures

Abstract

The paper presents a neural network architecture (MAXSON) based on second-order connections that can learn a multiple goal approach/avoid task using reinforcement from the environment. It also enables an agent to learn vicariously, from the successes and failures of other agents. The paper shows that MAXSON can learn certain spatial navigation tasks much faster than traditional Q-learning, as well as learn goal directed behavior, increasing the agent's chances of long-term sur vival. The paper shows that an extension of MAXSON (V-MAXSON) enables agents to learn vicariously, and this improves the overall survivability of the agent population.

Keywords

Second-order neural network simulated autonomous agents reinforcement learning vicarious learning.

Get full access to this article

View all access options for this article.

References

Ackley, D. and Littman, M. (1990). Generalization and scaling in reinforcement learning . In Advances in Neural Information Processing Systems 2.

Araujo, E.G. and Grupen, R.A. (1996). Learning control composition in a complex environment . In Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, pp. 333-342.

Billard, A. and Dautenhahn, K. (1998). Grounding communication in autonomous robots: an experimental study. Robotics and Autonomous Systems, 24:71-79.

Billard, A. and Dautenhahn, K. (1999). Experiments in social robotics: Grounding and use of communication in autonomous agents. Adaptive Behavior , 7(3/4).

Billard, A. and Hayes, G. (1997). Learning to communicate through imitation in autonomous robots. In Proceedings of the International Conference on Artificial Neural Networks, pp. 763-768.

Billard, A. and Hayes, G. (1999). Drama: A connectionist architecture for control and learning in autonomous robots. Adaptive Behavior, 7(1):35-63.

Blumberg, B.M. , Todd, P.M. , and Maes, P. (1996). No bad dogs: Ethological lessons for learning in hamsterdam. In Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, pp. 295-304.

Braitenberg, V. (1984). Vehicles: Experiments in Synthetic Psychology MIT Press, Cambridge, MA.

Cecconi, E. , Menczer, F. , and Belew, R. (1995). Maturation and the evolution of imitative learning in artificial organisms. Adaptive Behavior, 4(1):29-50.

10.

Cecconi, F. and Parisi, D. (1992). Neural networks with motivational units. In From animals to animats : Proceedings of the Second International Conference on Simulation of Adaptive Behavior, pp. 346-355.

11.

Chao, G. and Dyer, M.G. (1999). Concentric spatial maps for neural network based navigation. In Proceedings of the International Conference on Artificial Neural Networks, pp. 144-149.

12.

Churchland, P.S. and Sejnowski, T.J. (1992). The Computational Brain. Bradford Book/MIT Press, Cambridge, MA .

13.

Crabbe, F.L. and Dyer, M.G. (1999a). MAXSON: max-based second-order neural network reinforcement learner for mobile agents in continuous environments. Technical Report CSD-900009, UCLA.

14.

Crabbe, F.L. and Dyer, M.G. (1999b). Second-order networks for wall-building agents . In Proceedings of the International Joint Conference on Neural Networks.

15.

Crabbe, F.L. and Dyer, M.G. (1999c). Vicarious learning in mobile neurally controlled agents: The V- MAXSON architecture. In Proceedings of the International Conference on Artificial Neural Networks, pp. 904-909.

16.

Crabbe, F.L. and Dyer, M.G. (2000). Goal directed adaptive behavior in second-order neural networks : Leaning and evolving in the maxson architecture. In Honavar and Patel , Eds., Advances in the Evolutionary Synthesis of Neural System. MIT Press.

17.

Dietterich, T.G. (1998). The MAXQ method for hierarchical reinforcement learning. In Fifteenth International Conference on Machine Learning.

18.

Edelman, G.M. (1987). Neural Darwinism Basic Books , New York.

19.

Gallistel, C.R. (1990). The Organization of Behavior MIT Press, Cambridge, Ma.

20.

Giles, C.L. and Maxwell, T. (1987). Learning, invariance, and generalization in high-order neural networks. Applied Optics, 26(23):4972-4978.

21.

Gray, C.M. , Konig, P. , Engel, A.K. , and Singer, W. (1989). Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature, 338:334-337.

22.

Hebb, D.O. (1949). The Organization of Behavior Wiley, NewYork.

23.

Kaelbling, L.P. , Littman, M.L. , and Moore, A.W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4.

24.

Kaminka, G.A. and Tambe, M. (1999). I'm ok, you're ok, we're ok: Experiments in distributed and central- ized socially attentive monitoring. In Proceedings of the Third International Conference on Autonomous Agents (Agents-99)

25.

Lagoudakis, M.G. and Maida, A.S. (1999). Neural maps for mobile robot navigation. In Proceedings of the International Joint Conference on Neural Networks, Washington D.C.

26.

Landmesser, L. (1987). Death of neurons during development. In Adelman, G. , Ed., Encyclopedia of Neuroscience , pp. 303-304. Birkhauser.

27.

McCallum, A.K. (1996). Learning to use selective attention and short-term memory. In Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, pp. 315-324.

28.

Oppenheim, R.W. (1985). Naturally occuring cell death during neural development . Trends in Neurosciences, 8:487-493.

29.

Pfeifer, R. and Scheier, C. (1999). Understanding Intelligence MIT Press, Cambridge, MA.

30.

Ring, M. (1992). Two methods for hierarchy learning in reinforcement environments. In Proceedings of the Second International Conference on Simulation of Adaptive Behavior, pp. 148-155.

31.

Russell, S. and Norvig, P. (1995). Artificial Intelligence. A Modern Approach MIT Press, Cambridge, MA.

32.

Shastri, L. and Ajjanagadde, V. (1993). From simple associations to systematic reasoning: A connectionist representation of rules, variables, and dynamic bindings. Behavioral and Brain Sciences, 16(3):417-494.

33.

Sun, R. and Peterson, T. (1999). Partitioning in reinforcement learning. In Proceedings of the International Joint Conference on Neural Networks

34.

Sutton, R.S. (1996). Generalization in reinforcement learning: Successful examples using sparse code coding. In Advances in Neural Information Processing Systems 8, pp. 1038-1044.

35.

Sutton, R.S. and Barto, A.G. (1998). Reinforcement Learning MIT Press, Cambridge, MA.

36.

Thrun, S. and Schwartz, A. (1995). Finding struction in reinforcement learning. In Advances in Neural Information Processing Systems 7.

37.

Werner, G.M. (1994). Using second order neural connection for motivation of behavioral choices. In Proceedings of the Third International Conference on Simulation of Adaptive Behavior, pp. 154-161.