Sage Journals: Discover world-class research

Abstract

Learning to control dynamic systems with unknown models is a challenging research problem. However, most previous work that learns qualitative control rules does not construct qualitative states; a proper partition of continuous- state variables has to be designed by human users and given to the learning programs. We design a new learning method that learns appropriate qualitative state representation and the control rules simultaneously. Our method can aggressively partition the continuous-state variables into finer, discrete ranges until control rules based on these ranges are learned. As a case study, we apply our method to the benchmark control problem of cart-pole balancing (also known as the inverted pendulum). Experimental results show that our method not only derives different partitions for the cart-pole systems with different parameters but also learns to control the systems for an extended period of time from random initial positions.

Keywords

adaptive control reinforcement learning automatic quantization

Get full access to this article

View all access options for this article.

References

Anderson, C. (1986). Learning and problem solving with multilayer connectionist systems. Unpublished doctoral thesis, University of Massachusetts , Amherst.

Anderson, C.W. (1989). Learning to control an inverted pendulum with neural networks. IEEE Control Systems Magazine, 9(3), 31-37.

Anderson, C.W. , & Miller, W.T. (1990). A set of challenging control problems. In W T. Miller , R. S. Sutton , & P. J. Werbos (Eds.), Neural networks for control. Cambridge, MA: MIT Press.

Barto, A.G. , Sutton, R.S. , & Anderson, C.W. (1983). Neuron-like elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5), 834-846.

Berenji, H. (1992). A reinforcement learning-based architecture for fuzzy logic control. International Journal of Approximate Reasoning, 6(2), 267-292.

Berenji, H. , Chen, Y. , Lee, C. , Jang, J. , & Murugesan, S. (1991). A hierarchical approach to designing approximate reasoning-based controllers for dynamic physical systems. In Proceedings of the Sixth Conference on Uncertainty in AI. Elsevier, North Holland, pp. 331-343.

Cannon Jr., R.H. (1967). Dynamics of physical systems. New York: McGraw-Hill.

Chapman, D. , & Kaelbling, L. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In J. Mylopoulos & R. Reiter (Eds.), Proceedings of the Twelfth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann.

Connell, M.E. , & Utgoff, P.E. (1987). Learning to control a dynamic physical system . In Proceedings of the Sixth National Conference on Artificial Intelligence (AAAI-87). Los Altos, CA: Morgan Kaufmann.

10.

Lin, L.-J. (1990). Self-improving reactive agents: Case studies of reinforcement learning frameworks. In J.-A. Meyer & S. W Wilson (Eds.), From Animals to Animats: Proceedings of the First International Conference on the Simulation of Adaptive Behavior. Cambridge: MIT Press/Bradford Books.

11.

Ling, C.X. , & Marinov, M. (1993). Answering the connectionist challenge: A symbolic model of learning the past tense of English verbs. Cognition , 49(3), 235-290.

12.

Mahadevan, S. , & Connell, J. (1991). Automatic programming of behavior-based robots using reinforcement learning. In Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91). Menlo Park, CA: AAAI.

13.

Matheus, C. , & Rendell, L. (1989). Constructive induction on decision trees. In Proceedings of the Eleventh International Conference on Artificial Intelligence (IJCAI-89). San Mateo, CA: Morgan Kaufmann.

14.

Meyer, J.A. , & Guillot, A. (1994). From SAB90 to SAB94: Four years of animat research . In D. Cliff , P Husbands , J. A. Meyer , & S. Wilson (Eds.), From animals to animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior. Cambridge, MA: MIT Press /Bradford Books.

15.

Michie, D. , & Chambers, R. (1968). Boxes: An experiment in adaptive control. In E. Dale and D. Michie (Eds.), Machine intelligence 2. Edinburgh: Oliver and Boyd.

16.

Moore, A.W. (1994). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. In S. J. Hanson , J. D. Cowan , & C. L. Giles (Eds.), Advances in neural information processing systems 6. San Mateo, CA: Morgan Kaufmann.

17.

Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.

18.

Sammut, C. , & Cribb, J. (1990). Is learning rate a good performance criterion for learning. In B. W Porter & R. J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.

19.

Schlimmer, J. (1987). Learning and representation change. In J. McDermott (Ed.), Proceedings of the Tenth International Joint Conference on Artificial Intelligence (IJCAI-87) . Los Altos, CA: Morgan Kaufmann .

20.

Selfridge, O. , Sutton, R. , & Barto, A. (1985). Training and tracking in robotics. In A. Joshi (Ed.), Proceedings of the Ninth International Joint Conference on Artificial Intelligence (IJCAI-85) . Los Altos, CA: Morgan Kaufmann .

21.

Sutton, R.S. (1984). Temporal credit assignment In reinforcement learning . Unpublished doctoral thesis, University of Massachusetts , Amherst. (Also COINS Tech. Rep. No. 84-02.)

22.

Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9-44.

23.

Sutton, R. , Barto, A. , & Williams, R. (1991). Reinforcement learning is direct adaptive optimal control. In Proceedings of the 1991 American Control Conference, pp. 2143-2146.

24.

Urbancic, T. , & Bratko, I. (1993). Constructing control rules for a dynamic system: Probabilistic qualitative models, lookahead and exaggeration. International Journal of Systems Science, 24(6), 1155-1164.

25.

Utgoff, P. (1986). Machine learning of inductive bias. Boston: Kluwer.

26.

Watkins, C. (1989). Learning with delayed rewards. Unpublished doctoral thesis, Cambridge University, Cambridge, England.

27.

Widrow, B. , & Smith, F. (1964). Pattern recognising control systems. In J. Tou & R. Wilcox (Eds.), Computer and information sciences. Washington, D.C.: Spartan Books.

Learning to Control Dynamic Systems with Automatic Quantization

Abstract

Keywords

Get full access to this article

References