Abstract
This paper proposes a Q-learning-based algorithm to solve the linear quadratic regulator (LQR) problem for unknown dynamic two-dimensional (2D) discrete-time systems. First, based on the value function formulation constructed using the Lyapunov function framework, algebraic Riccati inequality (ARI) and the Bellman inequality for solving the LQR problem are derived. Subsequently, a suboptimal state feedback controller is obtained based on these inequalities, and an offline policy iteration algorithm based on semi-definite programming (SDP) is introduced. On this foundation, by introducing the concept of Q-learning, the objective function and the Bellman inequality are transformed into the Q-function and its corresponding inequality. A Q-learning-based offline policy iteration equation is then derived, and further, an online policy iteration algorithm based on Q-learning is designed. Data are collected online during each iteration to solve the LQR problem for 2D discrete systems with unknown dynamics. Finally, the effectiveness of the proposed control scheme is validated through two examples.
Get full access to this article
View all access options for this article.
