Sage Journals: Discover world-class research

Abstract

Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The benefit of using these strategies is demonstrated on some simple abstract learning tasks.

Keywords

reinforcement learning dynamic programming sequential decision problems

Get full access to this article

View all access options for this article.

References

Barto, A.G. , Bradtke, S.J. , & Singh, S.P. (1991). Real-time learning and control using asynchronous dynamic programming (COINS Technical Report No. 91-57). Amherst, MA: Department of Computer Science, University of Massachusetts.

Bertsekas, D.P. (1987). Dynamic programming: Deterministic and stochastic models. Englewood Cliffs, NJ: Prentice Hall.

Bertsekas, D.P. , & Tsitsiklis, J.N. (1989). Parallel and distributed computation: Numerical methods. Englewood Cliffs, NJ: Prentice Hall.

Holland, J.H. (1986). Escaping brittleness: The possibility of general-purpose learning algorithms applied to rule-based systems. In R. S. Michalski , J. G. Carbonell , & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol. 2). Los Altos, CA: Morgan Kaufmann .

Moore, A.W. (1991). Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. Proceedings of the Eighth International Machine Learning Workshop . San Mateo, CA: Morgan Kaufmann .

Moore, A.W. , & Atkeson, C.G. (1993). Memory-based reinforcement learning: Efficient computation with prioritized sweeping. In S. J. Hanson , J. D. Cowan , & C. L. Criles (Eds.), Advances in Neural Information Processing 5. San Mateo, CA: Morgan Kaufmann.

Nilsson, N.J. (1980). Principles of artificial intelligence. San Mateo, CA: Morgan Kaufmann.

Samuel, A.L. (1959). Some studies in machine learning using the game of checkers. IBM Joumal of Research and Development, 3, 210-229.

(Reprinted in E. A. Feigenbaum & J. Feldman [Eds.] [1963], Computers and thought . New York: McGraw-Hill.)

10.

Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.

11.

Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.

12.

Sutton, R.S. (1991). Planning by incremental dynamic programming. Proceedings of the Eighth International Machine Learning Workshop . San Mateo, CA: Morgan Kaufmann .

13.

Sutton, R.S. , Barto, A.G. , & Williams, R.J. (1992). Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine , 12, 19-22.

14.

Tesauro, G. (1992). Practical issues in temporal difference learning . Advances in Neural Information Processing Systems, 4, 259-266.

15.

Watkins, C.J.C.H. (1989). Learning from delayed rewards. Unpublished doctoral dissertation, Cambridge University, Cambridge, England.

16.

Watkins, C.J.C.H. , & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279-292.

17.

Williams, R.J. , & Baird, L.C. , III (1990). A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming. Proceedings of the Sixth Yale Workshop on Adaptive and Learning Systems. New Haven, CT: Yale University Center for Systems Science.

Efficient Learning and Planning Within the Dyna Framework

Abstract

Keywords

Get full access to this article

References