Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The benefit of using these strategies is demonstrated on some simple abstract learning tasks.
Barto, A.G., Bradtke, S.J., & Singh, S.P. (1991). Real-time learning and control using asynchronous dynamic programming (COINS Technical Report No. 91-57). Amherst, MA: Department of Computer Science, University of Massachusetts.
Holland, J.H. (1986). Escaping brittleness: The possibility of general-purpose learning algorithms applied to rule-based systems. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (Vol. 2). Los Altos, CA: Morgan Kaufmann .
5.
Moore, A.W. (1991). Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. Proceedings of the Eighth International Machine Learning Workshop . San Mateo, CA: Morgan Kaufmann .
6.
Moore, A.W., & Atkeson, C.G. (1993). Memory-based reinforcement learning: Efficient computation with prioritized sweeping. In S. J. Hanson, J. D. Cowan, & C. L. Criles (Eds.), Advances in Neural Information Processing 5. San Mateo, CA: Morgan Kaufmann.
7.
Nilsson, N.J. (1980). Principles of artificial intelligence. San Mateo, CA: Morgan Kaufmann.
8.
Samuel, A.L. (1959). Some studies in machine learning using the game of checkers. IBM Joumal of Research and Development, 3, 210-229.
9.
(Reprinted in E. A. Feigenbaum & J. Feldman [Eds.] [1963], Computers and thought . New York: McGraw-Hill.)
10.
Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
11.
Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
12.
Sutton, R.S. (1991). Planning by incremental dynamic programming. Proceedings of the Eighth International Machine Learning Workshop . San Mateo, CA: Morgan Kaufmann .
13.
Sutton, R.S., Barto, A.G., & Williams, R.J. (1992). Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine , 12, 19-22.
14.
Tesauro, G. (1992). Practical issues in temporal difference learning . Advances in Neural Information Processing Systems, 4, 259-266.
Williams, R.J., & Baird, L.C., III (1990). A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming. Proceedings of the Sixth Yale Workshop on Adaptive and Learning Systems. New Haven, CT: Yale University Center for Systems Science.