Sage Journals: Discover world-class research

Abstract

This paper proposes a Q-learning-based algorithm to solve the linear quadratic regulator (LQR) problem for unknown dynamic two-dimensional (2D) discrete-time systems. First, based on the value function formulation constructed using the Lyapunov function framework, algebraic Riccati inequality (ARI) and the Bellman inequality for solving the LQR problem are derived. Subsequently, a suboptimal state feedback controller is obtained based on these inequalities, and an offline policy iteration algorithm based on semi-definite programming (SDP) is introduced. On this foundation, by introducing the concept of Q-learning, the objective function and the Bellman inequality are transformed into the Q-function and its corresponding inequality. A Q-learning-based offline policy iteration equation is then derived, and further, an online policy iteration algorithm based on Q-learning is designed. Data are collected online during each iteration to solve the LQR problem for 2D discrete systems with unknown dynamics. Finally, the effectiveness of the proposed control scheme is validated through two examples.

Keywords

2D systems semi-definite programming (SDP)linear quadratic regulation (LQR)Q-learning

Get full access to this article

View all access options for this article.

References

Benamar

Ghezzar

Bouagada

, et al. (2024) On the admissibility and robust stabilization of 2D singular continuous–discrete linear systems. International Journal of Dynamics and Control 12(6): 1728–1742.

Cheng

Zhang

, et al. (2023) Co-design of adaptive event-triggered mechanism and asynchronous H∞ control for 2-D Markov jump systems via genetic algorithm. IEEE Transactions on Cybernetics 53(9): 5729–5740.

Cui

Pang

Krstić

, et al. (2025) Learning-based adaptive optimal control of linear time-delay systems: A value iteration approach. Automatica 171: 111944.

Dhawan

Kar

(2010) An LMI approach to robust optimal guaranteed cost control of 2-D discrete systems described by the Roesser model. Signal Processing 90(9): 2648–2654.

Dong

Huang

, et al. (2024) On policy iteration-based discounted optimal control. International Journal of Robust and Nonlinear Control 34(7): 4926–4942.

Fan

Xiong

(2024) Q-learning methods for LQR control of completely unknown discrete-time linear systems. Applied Mathematics and Computation 22: 5933–5943.

Fang

Ren

Wang

, et al. (2024) Finite-region asynchronous H_∞ filtering for 2-D Markov jump systems in Roesser model. Applied Mathematics and Computation 470: 128573.

(2023) Zero-sum game optimal control for the nonlinear switched systems based on heuristic dynamic programming. IEEE Transactions on Automation Science and Engineering 44(5): 2821–2837.

Guo

(2022) Optimal robust control of electro-hydraulic system based on Hamilton–Jacobi–Bellman solution with backstepping iteration. IEEE Transactions on Control Systems Technology 31(1): 459–466.

10.

Hafid Chelliq

Badie

Alfidi

, et al. (2025) Robust H∞ filter design for uncertain 2-D singular continuous systems with state-varying delay in Roesser model. Mathematical Methods in the Applied Sciences 48(6): 6303–6322.

11.

Hien

Trinh

(2017) Switching design for suboptimal guaranteed cost control of 2-D nonlinear switched systems in the Roesser model. Nonlinear Analysis: Hybrid Systems 24: 45–57.

12.

Kiumarsi

Lewis

Modares

, et al. (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4): 1167–1175.

13.

Kolmanovsky

Girard

, et al. (2019) LQ control of unknown discrete-time linear systems–a novel approach and a comparison study. Optimal Control Applications and Methods 40(2): 265–291.

14.

Lin

Zhang

Hui

SYR

(2017) Mathematical analysis of omnidirectional wireless power transfer-part-I: Two-dimensional systems. IEEE Transactions on Power Electronics 32(1): 625–633.

15.

Lopez

Alsalti

Müller

, et al. (2023) Efficient off-policy Q-learning for data-based discrete-time LQR problems. IEEE Transactions on Automatic Control 68(5): 2922–2933.

16.

Luo

Huang

Yan

, et al. (2024) Event-triggered control of switched 2D continuous-discrete systems. Transactions of the Institute of Measurement and Control 46(9): 1768–1778.

17.

Matei

(2019) Analytic design of directional and square-shaped 2D IIR filters based on digital prototypes. Multidimensional Systems and Signal Processing 30(4): 2021–2043.

18.

Zhao

Sun

, et al. (2019) An ADDHP-based Q-learning algorithm for optimal tracking control of linear discrete-time systems with unknown dynamics. Applied Soft Computing 82: 105593.

19.

Nethaji

Shanmugasundaram

(2020) The analysis and manipulation of a digitized image processing. ACCENTS Transactions on Image Processing and Computer Vision 6(18): 17–22.

20.

Peng

Chen

Sun

(2019) Reinforcement Q-learning algorithm for H_∞ tracking control of unknown discrete-time linear systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems 50(11): 4109–4122.

21.

Possieri

Sassano

(2023) Value iteration for continuous-time linear time-invariant systems. IEEE Transactions on Automatic Control 68(5): 3070–3077.

22.

Shen

Wang

Zhu

, et al. (2025) Event-triggered data-driven control of nonlinear systems via q-learning. IEEE Transactions on Systems, Man, and Cybernetics: Systems 55(2): 1069–1077.

23.

Shi

Yang

Jiang

, et al. (2022) Novel two-dimensional off-policy Q-learning method for output feedback optimal tracking control of batch process with unknown dynamics. Journal of Process Control 113: 29–41.

24.

Song

(2023) Optimal event-triggered control for C-T system with asymmetric constraints based on dual heuristic dynamic programing structure. Optimal Control Applications and Methods 44(3): 1305–1320.

25.

Tan

Liu

(2022) Approximate optimal tracking control for a class of discrete-time nonlinear systems based on GDHP iterative algorithm. In: Proceedings of 34th Chinese control and decision conference, Hefei, China, 21–23 May, pp. 1932–1937. Piscataway, NJ: IEEE.

26.

Tao

(2022) Asynchronous control of two-dimensional Markov jump Roesser systems: An event-triggering strategy. IEEE Transactions on Network Science and Engineering 9(4): 2278–2289.

27.

Wang

, et al. (2025) A reinforcement learning-based optimized backstepping control approach for uncertain electro-hydraulic systems. Mechanical Systems and Signal Processing 237: 112928.

28.

Zheng

, et al. (2024) Stabilization for 2-D switched T–S fuzzy systems under the state-dependent switching. Journal of the Franklin Institute 361(17): 107192.

29.

Xue

Zhang

(2025) Linear quadratic optimal control of stochastic 2-D Roesser models. Mathematics and Computers in Simulation 227: 500–510.

30.

Yamada

Saito

, et al. (1999) 2D model-following servo system. Multidimensional Systems and Signal Processing 10(1): 71–91.

31.

Zhao

Liu

(2024) Optimal control of two-dimensional Roesser model: Solution based on reinforcement learning. IEEE Transactions on Automatic Control 69(8): 5424–5430.

32.

Yeganefar

Ghamgui

, et al. (2013) Lyapunov theory for 2-D nonlinear Roesser models: Application to asymptotic and exponential stability. IEEE Transactions on Automatic Control 58(5): 1299–1304.

33.

Zhang

Ming

, et al. (2024) Optimal control for continuous-time unknown nonlinear affine systems: A Q-learning approach. IEEE Transactions on Automation Science and Engineering 21(4): 6519–6527.

34.

Zhang

(2024) Data-driven policy iteration algorithm for continuous-time stochastic linear-quadratic optimal control problems. Asian Journal of Control 26(1): 481–489.

35.

Zhang

Fan

Xue

, et al. (2023) Data-driven H_∞ optimal output feedback control for linear discrete-time systems based on off-policy Q-learning. IEEE Transactions on Neural Networks and Learning Systems 34(7): 3553–3567.

36.

Zhou

Wang

, et al. (2024) Thermal interference process between two energy piles in 2D model using transparent soil. Energy 308: 132442.

37.

Zhu

Zhang

(2023) Finite frequency H_∞ control of fractional-order continuous–discrete 2-D Roesser models. IEEE Transactions on Circuits and Systems II: Express Briefs 70(9): 3403–3407.

Reinforcement Q-learning optimal control of 2D discrete-time systems with unknown dynamics

Abstract

Keywords

Get full access to this article

References