B-Learning: A Reinforcement Learning Variant for the Control of a Plant

Abstract

This paper presents a new reinforcement learning scheme called B-Learning. This approach leads to an estimate of the expected benefits provided by each action with respect to the current policy. This algorithm performs a one-step ahead exhaustive search in the action space and allows the introduction of additional constraints. The method is successfully applied to the control of a water production plant.

Get full access to this article

View all access options for this article.

References

Anderson, C.W. 1989. "Learning to Control an Inverted Pendulum Using Neural Networks", IEEE Control Magazine , pp. 31-37.

Barto, A.G. , S.J. Bradtke and S.P. Singh. 1991. "Real-Time Learning and Control Using Asynchronous Dynamic Programming", Technical Report 91-57, Amherst, MA. University of Massachusetts, Dept of Computer Science.

Barto, A.G. , R.S. Sutton and C.W. Anderson. 1983. "Neuron-Like Adaptive Elements That Can Solve Difficult Learning Problems", IEEE Transactions on Systems, Man and Cybernetics, SMC-13(5) 834-846

Hoskins, J.C. and D.M. Himelblau 1990. "Process Control Via Incremental Neural Networks and Reinforcement Learning", in Chicago Meeting of American Institute of Chemical Engineers, Chicago, IL.

Jordan, M.I. and R.A. Jacobs. 1990. "Learning to Control an Unstable System with Forward Modeling", R. P. Lippmamn, J E Moody and D. S Touretzky, eds., Advances in Neural Information Processing Systems 2, Morgan Kaufmann, pp. 324-331.

Jouse, W.E. and J.G. Williams. 1991. "The Control of Nuclear Reactor Start-Up Using Drive Reinforcement Theory", C. H. Dagli, S R. T. Kumara and Y. C. Shin, eds., Intelligent Engineering Systems Through Artificial Neural Networks, St. Louis, MO, p. 537-544

Kora, R. , P. Lesueur and P. Villon. 1992. 'An Adaptive Optimal Control Algorithm for Water Treatment Plants", To Be Published

Langlois, T. 1992. Algorithmes d'Apprentissage par Renforcement pour la Commande Adaptative, Ph. D. thesis, Université de Technologie de Compiègne, URA 817 Heudiasyc.

Langlois, T. and S. Canu. 1992. "Control of Time-Delay Systems Using Reinforcement Learning" , in Artificial Neural Networks , 2, Elsevier Science Publishers.

10.

Lm, L.-J. 1991. "Programming Robots Using Reinforcement Learning and Teaching", in Ninth National Conference on Artificial Intelligence , pp. 781-786.

11.

Lin, L.J. 1992. "Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching", Machine Learning, 8(3-4):923-321.

12.

Seube, N. 1992 Regulation de Systèmes Contrôlés avec Contraintes sur l'état par Rêseaux de Neurones. Ph. D thesis, Universié Paris Dauphine, UFR Mathematique de la Decision.

13.

Sutton, R.S. 1988. "Learning to Predict by the Method of Temporal Differences" , Machine Learning, 3.9-44.

14.

Sutton, R.S. 1990 "Integrated Modeling Control Based on Reinforcement Learning and Dynamic Programming", in Advances in Neural Information Processing Systems, vol. 3, R. P. Lippman, J E. Moody and D. S. Touretzky, eds., Morgan Kaufmann. pp. 471-478.

15.

Watkins, C.J.C.H. 1989. Learmng with Delayed Rewards. Ph. D. thesis , Cambridge University Psychology Department.

16.

Werbos, P.J. 1990. "Consistency of HDP Applied to a Simple Reinforcement Learning Problem", Neural Networks, 3:179-189.