Sage Journals: Discover world-class research

Abstract

This paper considers dual-rate systems, where the output is measured at a relatively slow rate while the control signal is adjusted at a faster rate. The output sampling time is an integer multiple of the input sampling time. The paper examines dual-rate inferential control systems, which consist of a fast model, a slow model, and a switch. Missing output samples are estimated using the fast single-rate model. The single-rate control algorithm is then implemented at the fast-sampling rate. The fast-sampling discrete-time model is derived from the plant’s continuous-time model using the first-order hold (FOH) element. A discrete LQ regulator is proposed for this plant model, with a prescribed degree of stability (all closed-loop eigenvalues are within the range 0 < λ < 1 in magnitude). The matrix gain is calculated offline, and an online method for calculating the regulator gain is provided. The regulator gain is calculated using policy iteration, specifically Hewer’s algorithm. Finally, it is demonstrated that the presented inferential control system remains effective in the presence of multiplicative unmodeled dynamics. The main contributions of the paper are: (i) Designing the LQ regulator with a prescribed degree of stability using reinforcement learning (RL) (generalized policy iteration); and (ii) Considering the robust stability of the inferential control system in the presence of multiplicative unmodeled dynamics using the lifting technique.

Keywords

Inferential systems LQ regulator degree of stability reinforcement learning

Introduction

There are industrial processes that use digital control with sampled input and output values at different sample time intervals.^1,2 These are known as multi-rate systems. Standard control techniques cannot be used in these circumstances, leading to significant interest in these systems.^3,4

Kranc proposed the switch decomposition method for controlling multi-rate systems. This method involves transforming multi-rate systems (specifically dual-rate systems) into single-rate systems. The method has been further developed under the name “lifting technique,”^5–7 which is now a standard tool for transforming periodically time-varying systems into time-invariant ones. This technique is particularly important for state-space models.

Additionally, a dual-rate model that uses all available data (fast input and slow output data) can be derived using the polynomial transformation technique.⁸ When stochastic disturbances have a non-Gaussian distribution, this method is used for the recursive identification of stochastic systems with unmodeled dynamics.⁹ However, these transformation strategies require identifying more parameters. This issue was addressed by employing an accelerated stochastic approximation approach and the Bayesian information criteria, which allowed for identifying a fast model with fewer parameters.¹⁰

In chemical processes, dual-rate systems are often used^11,12 and dual-rate techniques have recently been introduced in filtering theory.¹³ Inferential control is an effective method for controlling dual-rate systems.^14,15 This method first estimates the missing output samples using a fast single-rate model. Then, it applies a single-rate regulator at the same fast sampling rate.

The paper assumes that the plant has a known continuous-time state-space model. The plant’s fast discrete-time model is derived using the first order hold (FOH) element.¹⁶ A linear quadratic (LQ) regulator can be designed for this model.¹⁷ The paper¹⁸ proposes designing an LQ regulator, in the continuous time, to ensure that all poles of the closed-loop system lie in the left half-plane, Re{s} < −α, with α > 0 chosen by the designer. This approach provides greater tolerance for time delays and nonlinearities.

In this paper, we address the design of an LQ regulator with a specified degree of stability for discrete-time systems. The criterion ensures all closed-loop eigenvalues have magnitudes less than λ∈ (0,1]. The design procedure is typically completed offline.

We propose an online method for designing LQ regulators with a specified stability level, building on current active research¹⁹ and documented in relevant monographs.^20–24 This iterative procedure solves the algebraic Riccati equation, forming the basis for LQ regulator gains, using generalized policy iteration derived from control theory principles.^25,26 Generalized policy iteration optimizes control laws iteratively until converging to optimal solutions for various dynamical systems and cost functions. Recursive feasibility, robust stability, and near-optimality properties are explored using policy iteration.²⁷ Recent advancements in online policy iteration algorithms for optimal control in continuous-time systems with input constraints are discussed in.²⁸ The intersection of reinforcement learning with adaptive control is explored in.²⁹ Research references^30,31 address LQ regulator design under unknown linear system dynamics.

Reinforcement Learning (RL) is a broad area. Reference³² explores RL based on differential games. Optimal control applications in industries using RL are discussed in Reference.³³ Reference³⁴ examines RL’s impact on decision-making under uncertainty. Reference³⁵ describes a multi-agent system based on RL. Stochastic approximation algorithms and algorithms such as temporal-difference learning and Q-learning are detailed in Reference.³⁶ Further developments in RL are also discussed.^36,37

The problem considered in this paper falls under model-based RL algorithms using adaptive dynamic programing. To the best of the authors’ knowledge, this problem has not been addressed in the literature.

The paper examines the robust stability of a closed-loop system controlled by the proposed LQ regulator, considering the presence of unmodeled dynamics in the form of multiplicative uncertainty. Through the use of the lifting technique,³⁸ it demonstrates that dual-rate systems exhibit superior performance compared to fast-rate systems.

The main contributions of the paper are: (i) design of LQ regulators with the prescribed degree of stability for linear discrete-time fast rate model based on FOH and the design of reinforcement learning LQ regulators based on generalized policy iteration; (ii) consideration of robust stability of inferential systems in the presence multiplicative unmodeled dynamics using lifting techniques.

Problem formulation

Consider a single-input, single-output, single-rate system shown in Figure 1.

Figure 1.

The single-rate system.

In Figure 1. P_c represents a continuous linear time invariant (LTI) plant, H_h represents a zero-order hold (ZOH) and S_h is an ideal sampler. Both, H_h and S_h, operate with the sampling period h. Here, we introduce equivalent discrete time model for P_c

P = S_{h} P_{c} H_{h}

(1)

The standard discrete time control system is then shown in Figure 2.

Figure 2.

The discrete time single-rate control system.

In Figure 2, K represents a controller. In practical scenarios, sampling the output as fast as the input is often impossible due to physical sensor constraints. Therefore, in Figure 1, S_h is replaced with a slower sampler S_hp, where p ≥ 2 is an integer. The following figure illustrate this situation.

The input-output data are:

(i) {u(kh): k = 0, 1, 2, ….} at a fast rate,

(ii) {y(khp): k = 0, 1, 2, …} at a slow rate.

As a result, the intermediate output samples y(khp + j), for j = 1, 2, 3, …, p − 1, are not available. The dual-rate measurement is represented by the pair {u(kh), y(khp)}. The control system in this scenario is structured as shown in the figure below.

In Figure 4, $\hat{P}$ represents a model for the fast single-rate system P. The K is a fast single-rate regulator, and S denotes a switch. The fast output signal y_f(kh) comprises the slow-sampled output y(khp) taken every ph periods, along with the estimated output $\hat{y} (kh)$ from the model $\hat{P}$ .

It’s important to note that S_hp is equivalent to S_h followed by periodic switch S. Therefore, Figure 3 can be modified accordingly in the following figure.

Figure 3.

The dual-rate systems.

The signal $\hat{y} (kh)$ replaces the missing samples in y(khp). The feedback signal y_f (kh) is defined by the following relation:

y_{f} (kh) = {\begin{matrix} y (jph), kh = jph, j = 0, 1, 2, 3, \dots \\ \hat{y} ((jp + i) h), kh = jph + hi, i = 0, 1, 2, 3, . . ., p - 1 \end{matrix}

(2)

Finally, inferential control in Figure 4 consists of a fast-rate plant model, a fast single-rate regulator, and a periodic switch S.

Figure 4.

The sampled-data inferential control system.

From Figure 5, without unmodeled dynamics, it follows that $\hat{P} = P$ . This means the dual-rate system is equivalent to the single-rate system in Figure 2. Therefore, the LQ regulator design is based only on the fast-rate model. The presence of unmodeled dynamics will be considered later.

Figure 5.

Modified dual-rate system.

Fast model of the plant based on FOH method

Suppose the continuous time model of the plant P_c in state-space form is:

\overset{\cdot}{x} (t) = A_{c} x (t) + B_{c} u (t)

(3)

In the next section, we will derive the discrete-time model $\hat{P}$ using the first-order hold (FOH) method. This method provides a more accurate discrete approximation of the continuous-time model by linearly extrapolating from current and previous input sequence elements.

\begin{matrix} u (t) = u (kh) + \frac{u (kh) - u ((k - 1) h)}{h} (t - kh), \\ kh \leq t \leq (k + 1) h \end{matrix}

(4)

By using instantaneous sampling, the sampler generates a discrete-time sequence:

z_{u} = z (kh)

(5)

The FOH dynamic is presented in the Figure 6.

Figure 6.

Impulse response of FOH.

The fast model $P_{F} = \hat{P}$ , based on the FOH approach, has the following form¹⁶

\begin{matrix} [\begin{matrix} x ((k + 1) h) \\ u (kh) \end{matrix}] = [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] [\begin{matrix} x (kh) \\ u ((k - 1) h) \end{matrix}] + [\begin{matrix} B_{2} \\ I \end{matrix}] u (kh) \\ = [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] x_{p} (kh) + [\begin{matrix} B_{2} \\ I \end{matrix}] u (kh) \end{matrix}

(6)

where

A = e^{A_{c} h}

B_{1} = \int_{0}^{h} (\frac{η}{h} - 1) e^{A_{c} η} B_{c} d η

B_{2} = \int_{0}^{h} (2 - \frac{η}{h}) e^{A_{c} η} B_{c} d η

Finally, we can obtain the next block diagram for the FOH and system (6)

LQ with prescribed degree of stability for fast discrete time model

Now, we introduce a criterion for regulator design with constraints in equation (7). This criterion ensures that all closed-loop eigenvalues have magnitudes less than λ∈ (0,1], known as the closed-loop pole constraint. For system (6), the performance index is:

[\begin{matrix} x ((k + 1) h) \\ u (kh) \end{matrix}] = [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] [\begin{matrix} x (kh) \\ u ((k - 1) h) \end{matrix}] + [\begin{matrix} B_{2} \\ I \end{matrix}] u (kh)

\begin{matrix} V ([\begin{matrix} x (kh) \\ u ((k - 1) h) \end{matrix}]) = \sum_{k = k_{0}}^{\infty} λ^{- zkh} {[\begin{matrix} x (kh) \\ u ((k - 1) h) \end{matrix}]}^{T} \\ Q [\begin{matrix} x (kh) \\ u ((k - 1) h) \end{matrix}] + u^{T} (kh) Ru (kh) \end{matrix}

(7)

The following theorem is now developed.

Theorem 1. Let us suppose that for system (6) and criterion (7) is fulfilled.

1) $([\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}], [\begin{matrix} B_{2} \\ I \end{matrix}])$ is completely controllable.

2) Let H be any matrix so that Q = HH^T. The pair $([\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}], H)$ is completely observable.

3) $Q = Q^{T} \geq 0$ .

4) $R = R^{T} > 0$ .

5) Degree of stability for system (6) is λ∈ (0,1].

Then,

u (kh) = - K [\begin{matrix} x (kh) \\ u ((k - 1) h) \end{matrix}]

where:

K = λ^{- 2 h} {(R + λ^{- 2 h} {[\begin{matrix} B_{2} \\ I \end{matrix}]}^{T} P [\begin{matrix} B_{2} \\ I \end{matrix}])}^{- 1} {[\begin{matrix} B_{2} \\ I \end{matrix}]}^{T} P [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}]

and matrix P is a solution of the next algebraic Riccati equation:

\begin{matrix} {(λ^{- h} [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] - λ^{- h} [\begin{matrix} B_{2} \\ I \end{matrix}] K)}^{T} P (λ^{- h} [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] - λ^{- h} [\begin{matrix} B_{2} \\ I \end{matrix}] K) \\ - P + Q + K^{T} RK = 0 \end{matrix}

Proof: Let us introduce

λ^{- kh} [\begin{matrix} x (kh) \\ u ((k - 1) h) \end{matrix}] = [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}]

(8)

\hat{u} (kh) = λ^{- kh} u (kh)

(9)

From relations (7)–(9) follows that:

\begin{matrix} \hat{V} ([\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}]) = \sum_{k = k_{0}}^{\infty} λ^{- 2 kh} {[\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}]}^{T} \\ Q [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}] + {\hat{u}}^{T} (kh) R \hat{u} (kh) \end{matrix}

(10)

From equations (6), (8), and (9) we have:

\begin{matrix} [\begin{matrix} \hat{x} ((k + 1) h) \\ \hat{u} (kh) \end{matrix}] = λ^{- h} [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}] \\ + λ^{- h} [\begin{matrix} B_{2} \\ I \end{matrix}] \hat{u} (kh) \end{matrix}

(11)

owing the fact:

λ^{- (k + 1) h} [\begin{matrix} x ((k + 1) h) \\ u (kh) \end{matrix}] = [\begin{matrix} \hat{x} ((k + 1) h) \\ λ^{- h} \hat{u} (kh) \end{matrix}]

(12)

The optimal performance index has a form:

\hat{V} (\cdot) = {[\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}]}^{T} P [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}]

(13)

where P is symmetric matrix.

The Bellman equation for our case is:

\begin{matrix} {[\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}]}^{T} P [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}] = {[\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}]}^{T} \\ Q [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}] + {\hat{u}}^{T} (kh) R \hat{u} (kh) + {[\begin{matrix} \hat{x} ((k + 1) h) \\ λ^{- h} \hat{u} (kh) \end{matrix}]}^{T} \\ P [\begin{matrix} \hat{x} ((k + 1) h) \\ λ^{- h} \hat{u} (kh) \end{matrix}] \end{matrix}

(14)

For last term in relation (14), by using relation (11), one can get:

\begin{matrix} {(λ^{- h} [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}] + λ^{- h} [\begin{matrix} B_{2} \\ I \end{matrix}] \hat{u} (kh))}^{T} \\ P (λ^{- h} [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}] + λ^{- h} [\begin{matrix} B_{2} \\ I \end{matrix}] \hat{u} (kh)) \\ = λ^{- 2 h} {[\begin{matrix} \hat{x} (kh) \\ λ^{- h} u ((k - 1) h) \end{matrix}]}^{T} {[\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}]}^{T} P [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] \\ [\begin{matrix} \hat{x} (kh) \\ λ^{- h} u ((k - 1) h) \end{matrix}] + λ^{- 2 h} {[\begin{matrix} \hat{x} (kh) \\ λ^{- h} u ((k - 1) h) \end{matrix}]}^{T} {[\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}]}^{T} \\ P [\begin{matrix} B_{2} \\ I \end{matrix}] \hat{u} (kh) + λ^{- 2 h} {\hat{u}}^{T} (kh) {[\begin{matrix} B_{2} \\ I \end{matrix}]}^{T} P [\begin{matrix} B_{2} \\ I \end{matrix}] \hat{u} (kh) \end{matrix}

(15)

According to (13) we have:

\begin{matrix} \hat{V} ([\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}]) = {[\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}]}^{T} \\ P [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}] \end{matrix}

(16)

From (14), (15) and (16) it follows that from:

\frac{\partial \hat{V} ([\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}])}{\partial \hat{u} (kh)} = 0

(17)

we have:

\begin{matrix} \hat{u} (kh) = - λ^{- 2 h} {(R + λ^{- 2 h} {[\begin{matrix} B_{2} \\ I \end{matrix}]}^{T} P [\begin{matrix} B_{2} \\ I \end{matrix}])}^{- 1} {[\begin{matrix} B_{2} \\ I \end{matrix}]}^{T} \\ P [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}] = - K [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}] \end{matrix}

(18)

where:

K = λ^{- 2 h} {(R + λ^{- 2 h} {[\begin{matrix} B_{2} \\ I \end{matrix}]}^{T} P [\begin{matrix} B_{2} \\ I \end{matrix}])}^{- 1} {[\begin{matrix} B_{2} \\ I \end{matrix}]}^{T} P [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}]

(19)

By using (14) and (18) one can get equality:

{[\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}]}^{T} T [\begin{matrix} \hat{x} (kh) \\ λ^{- h} \hat{u} ((k - 1) h) \end{matrix}] = 0

(20)

Since this must hold for all state the matrix:

T = 0

(21)

In our case, matrix T has a form:

\begin{matrix} {(λ^{- h} [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] - λ^{- h} [\begin{matrix} B_{2} \\ I \end{matrix}] K)}^{T} P (λ^{- h} [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] - λ^{- h} [\begin{matrix} B_{2} \\ I \end{matrix}] K) \\ - P + Q + K^{T} RK = 0 \end{matrix}

(22)

If we put (19) into (22) after arranging the formula one can get:

\begin{matrix} λ^{- 2 h} {[\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}]}^{T} P [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] - P + Q - λ^{- 4 h} {[\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}]}^{T} \\ P [\begin{matrix} B_{2} \\ I \end{matrix}] \cdot {(R + λ^{- 2 h} {[\begin{matrix} B_{2} \\ I \end{matrix}]}^{T} P)}^{- 1} {[\begin{matrix} B_{2} \\ I \end{matrix}]}^{T} P [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] = 0 \end{matrix}

(23)

Based on relations (8), (9), (18) and (23) it follows the proof of the theorem.

LQ regulator design is described by Algorithm 1.

Algorithm 1

Choose the matrices $Q = Q^{T} \geq 0$ , $R = R^{T} > 0$ , λ, and h.

For P, solve algebraic Riccati equation (23).

Determine the regulator’s gain K (relation (19)).

It is possible to see that the procedure for regulator design is off-line.

Robust stability of dual-rate systems

In this section, we consider robust stability of the dual-rate system. It is supposed that Assumption 1 is not valid. We treat $\hat{P}$ as a nominal model and consider multiplicative uncertainty. The uncertainty class is given by³⁹:

P_{Δ} (z) = \hat{P} (z) (I + W_{1} (z) Δ (z) W_{2} (z))

(24)

where $Δ (z)$ is a perturbation and $W_{1} (z)$ and $W_{2} (z)$ are fixed frequency weighting filters. Let us note that sampling period for input signal is h and for state signal is ph.

The standard technique for analysis and design of multi-rate systems is lifting.³⁸ The lifting technique transforms periodically time-varying systems into time-invariant systems. Let u(kh) be a discrete-time signal defined on the set {0, 1, 2, …}.

u (kh) = {u (0), u (h), u (2 h), \dots ., u (kh), \dots}

The lifting operator L is the map from $u (kh)$ to $\underline{u} (kh)$

L : u (kh) \to \underline{u} (kh)

(25)

where:

\begin{matrix} \underline{u} (kh) = \\ {\begin{matrix} [\begin{matrix} u (0) \\ u (h) \\ \begin{matrix} ⋮ \\ u (ph - 1) \end{matrix} \end{matrix}], & [\begin{matrix} u (ph) \\ u (ph + 1) \\ \begin{matrix} ⋮ \\ u (2 ph - 1) \end{matrix} \end{matrix}], & \begin{matrix} \dots & [\begin{matrix} u (khp) \\ u (khp + 1) \\ \begin{matrix} ⋮ \\ u (khp + ph - 1) \end{matrix} \end{matrix}], \dots \end{matrix} \end{matrix}} \end{matrix}

(26)

We now can formulate theorem for lifted systems.

Theorem 2. Consider the system (6). Suppose the following:

1) $T_{1} = h$ is the sampling period for input ${u (\cdot)}$ .

2) $T_{2} = ph, p > 1$ is the sampling period for state ${x_{p}}$ .

Then,

A. Lifted system for system (6) is:

x_{p} ((k + 1) ph) = \underline{A} x_{p} (kph) + \underline{B} \underline{u} (kph)

where:

\begin{matrix} x_{p} ((k + 1) ph) = [\begin{matrix} x ((k + 1) ph) \\ u (kph) \end{matrix}], \underline{u} (kph) \\ = [\begin{matrix} u (kph) \\ u (kph + h) \\ \begin{matrix} ⋮ \\ u (kph + ph - 1) \end{matrix} \end{matrix}] \end{matrix}

\begin{matrix} \underline{A} = A_{F}^{p}, \underline{B} = [A_{F}^{p - 1} B_{F}, A_{F}^{p - 2} B_{F, \dots .} B_{F}], A_{F} = [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}], \\ B_{F} = [\begin{matrix} B_{2} \\ I \end{matrix}] \end{matrix}

B. The lifted regulator has a gain:

\underline{K} = λ^{- 2 ph} {(R + λ^{- 2 ph} \underline{B^{T}} \underline{P} \underline{B})}^{- 1} \underline{B^{T}} \underline{P} \underline{A}

Proof: It is possible to rewrite relation (6) in the next form:

x_{p} ((k + 1) h) = A_{F} x_{p} (kh) + B_{F} u (kh)

Let us replace k with $kp$ . We have:

\begin{matrix} x_{p} (kph + h) = A_{F} x_{p} (kph) + B_{F} u (kph) \\ x_{p} (kph + 2 h) = A_{F} x_{p} (kph + h) + B_{F} u (kph + h) = \\ = A_{F} (A_{F} x_{p} (kph) + B_{F} u (kph)) + Bu (kph + h) = \\ = A_{F}^{2} x_{p} (kph) + A_{F} B_{F} u (kph) + B_{F} u (kph + h) \\ x_{p} (kph + 3 h) = A_{F} x_{p} (kph + 2 h) + B_{F} u (kph + 2 h) = \dots = \\ = A_{F}^{3} x_{p} (kph) + A_{F}^{2} B_{F} u (kph) + B_{F} u (kph + 2 h) \\ ⋮ \\ x_{p} (kph + ph) = A_{F}^{p} x_{p} (kph) + A_{F}^{p - 1} B_{F} u (kph) \\ + A_{F}^{p - 2} B_{F} u (kph + h) + \dots + B_{F} u (kph + ph - 1) \end{matrix}

(27)

This equation we can rewrite in the next form:

\begin{matrix} x_{p} (kph + ph) = A_{F}^{p} x_{p} (kph) + \begin{matrix} A_{F}^{p - 1} B_{F} & A_{F}^{p - 2} B_{F} & \begin{matrix} \dots & B_{F} \end{matrix} \end{matrix} \\ [\begin{matrix} u (kph) \\ u (kph + h) \\ \begin{matrix} ⋮ \\ u (kph + ph - 1) \end{matrix} \end{matrix}] = \underline{A} x_{p} (kph) + \underline{B} \underline{u} (kph) \end{matrix}

(28)

The statement A) is proven. The results of statement B) follows from the solution of Riccati equation when we replace matrices A and B with matrices $\underline{A}$ and $\underline{B}$ . For the criterion for LQ regulator design, we use:

λ^{- khp} [\begin{matrix} x (kph) \\ u ((k - 1) ph) \end{matrix}] = [\begin{matrix} \hat{x} (kph) \\ λ^{- ph} \hat{u} ((k - 1) ph) \end{matrix}]

(29)

\hat{u} (kph) = λ^{- kph} u (kph)

(30)

Theorem is proved. According to Reference,³⁸ we have the next lifted systems:

{\underline{W}}_{1} (z) = L W_{1} (z) L^{- 1}, {\underline{W}}_{2} (z) = L W_{2} (z) L^{- 1}, \underline{Δ} (z) = L \underline{Δ} (z) L^{- 1}

(31)

Lastly, we shall determine the lifted transfer function $\underline{\hat{P}} (z)$ ,

X_{p} (z) = \underline{\hat{P}} (z) U (z)

(32)

From relations (28) and (32), it follows:

\underline{\hat{P}} (z) = {(zI - \underline{A})}^{- 1} \underline{B}

(33)

Now we present the model for switch S in Figure 7. According to Reference,¹⁵ we have the situation as shown in the next figure:

Figure 7.

The sampled-data inferential system, H_F,h is FOH DA converter, $\hat{P}$ = P_F is fast plant model.

The R₁ and R₂ are static systems with the following matrix form:

R_{1} = {[\begin{matrix} \begin{matrix} 1 & 0 \end{matrix} & \dots & 0 \\ \begin{matrix} 0 & 0 \end{matrix} & \dots & 0 \\ \begin{matrix} \begin{matrix} ⋮ \\ 0 \end{matrix} & \begin{matrix} ⋮ \\ 0 \end{matrix} \end{matrix} & \begin{matrix} ⋱ \\ \dots \end{matrix} & \begin{matrix} ⋮ \\ 0 \end{matrix} \end{matrix}]}_{p \times p}

(34)

R_{2} = {[\begin{matrix} \begin{matrix} 0 & 0 \end{matrix} & \dots & 0 \\ \begin{matrix} 0 & 1 \end{matrix} & \dots & 0 \\ \begin{matrix} \begin{matrix} ⋮ \\ 0 \end{matrix} & \begin{matrix} ⋮ \\ 0 \end{matrix} \end{matrix} & \begin{matrix} ⋱ \\ \dots \end{matrix} & \begin{matrix} ⋮ \\ 1 \end{matrix} \end{matrix}]}_{p \times p}

(35)

The feedback signal $x_{F} (\cdot)$ is defined as:

x_{F} (\cdot) = R_{1} x_{p} (\cdot) + R_{2} {\hat{x}}_{p} (\cdot)

(36)

Let us notice that:

R_{1} + R_{2} = J

(37)

According to Assumption 1, along with relations (24), (34)–(37) and Figures 1 and 8, the following figure can be derived:

Figure 8.

The model of the switch in the sampled data inferential system.

Our main goal is to study the robust stability of dual-rate system in Figure 9. To apply the small gain theorem,³⁹ we will convert system from Figure 9 to $M - Δ$ form³⁹ as shown in Figure 10.

Figure 9.

The lifted inferential control system with multiplicative uncertainty.

Figure 10.

The $M - Δ$ structure for system in Figure 9.

Now let’s determine the matrix $\underline{M}$ . This is formulated as a theorem.

Theorem 3. Let us consider the system presented in Figure 9. Suppose the following assumptions hold:

1) The perturbation $\underline{Δ} (z)$ is a stable and linear time-invariant system with norm $\underline{Δ} (z)_{\infty} < 1$

2) ${\underline{W}}_{1} (z)$ and ${\underline{W}}_{2} (z)$ are fixed frequency weighting filters that are stable and linear-time invariant systems.

Then, the matrix $\underline{M} (z)$ takes the form:

\underline{M} (z) = - {\underline{W}}_{2} (z) {(I + \underline{K} \underline{\hat{P}} (z))}^{- 1} \underline{K} R_{1} {\underline{W}}_{1} (z) \underline{\hat{P}} (z)

Proof: From Figure 9 we derive the expression as follows:

{\underline{U}}_{Δ} (z) = {\underline{W}}_{2} (z) \underline{U} (z)

(38)

\underline{U} (z) = - \underline{K} X_{F} (z)

(39)

It is also noted that:

{\hat{X}}_{p} (z) = - \underline{\hat{P}} (z) \underline{U} (z)

(40)

{\hat{X}}_{p} (z) = {\underline{W}}_{1} (z) \hat{P} (z) X_{p Δ} (z)

(41)

Using relation (37), we obtain for the feedback signal $X_{F} (z)$ :

\begin{matrix} X_{F} (z) = R_{1} {\hat{X}}_{p} (z) + (R_{1} + R_{2}) \underline{\hat{P}} (z) \underline{U} (z) \\ = R_{1} {\underline{W}}_{1} (z) \underline{\hat{P}} (z) X_{p Δ} (z) + \underline{\hat{P}} (z) \underline{U} (z) \end{matrix}

(42)

Based on relation (39), it follows that:

\underline{U} (z) = - \underline{K} R_{1} {\underline{W}}_{1} (z) \underline{\hat{P}} (z) X_{p Δ} (z) - \underline{K} \underline{\hat{P}} (z) U (z)

(43)

Then we have:

\underline{U} (z) = - {(I + \underline{K} \underline{\hat{P}} (z))}^{- 1} \underline{K} R_{1} {\underline{W}}_{1} (z) \underline{\hat{P}} (z) X_{p Δ} (z)

(44)

By using relation (39) and (43), it follows that:

{\underline{U}}_{Δ} (z) = - {\underline{W}}_{2} (z) {(I + \underline{K} \underline{\hat{P}} (z))}^{- 1} \underline{K} R_{1} {\underline{W}}_{1} (z) \underline{\hat{P}} (z) X_{p Δ} (z)

(45)

The theorem is thus proved. Finally, we formulate a theorem for robust stability for the dual-rate system.

Theorem 4. Suppose that the assumptions of Theorem 3 are satisfied, along with the following assumption:

1) System is nominally stable (K stabilizes $\hat{P} (z)$ ).

Then,

A. For the dual-rate system to be stable for all admissible perturbation $Δ$ , it is sufficient that:

‖ {\underline{W}}_{2} (z) {(I + \underline{K} \underline{\hat{P}} (z))}^{- 1} \underline{K} R_{1} {\underline{W}}_{1} (z) \underline{\hat{P}} (z) ‖_{\infty} < 1

B. The fast single-rate control is no more robust than the dual-rate inferential system.

Proof: The proof is based on small gain theorem and manipulation with matrix norms, similar to Proposition 2 and Corollary 1 in Reference.¹⁵

Model-based policy iteration algorithm for LQ regulator design

In this section, we discuss an online approach for designing regulators, using Hewer’s algorithm to solve the discrete-time Riccati equation.²⁵ This method is rooted in reinforcement learning. We show that Hewer’s algorithm converges under stability and detectability assumptions.

Reinforcement learning suggests generalized policy iteration,¹⁹ where the algorithm involves iterating l steps to solve the matrix equation in each iteration j. When l = 1, it corresponds to value iteration, and for l = ∞, it represents policy iteration. The algorithm, based on equations (19) and (22), is summarized in the table below.

Algorithm 2

Select matrices $Q = Q^{T} \geq 0$ , $R = R^{T} \geq 0$ , K₀ (not necessarily stabilizing),

P_{o} = I, ε > 0, λ, h > 0, j = 0, i = 0, 1, 2, \dots l - 1 .

2. $P_{j}^{0} = P_{j}$

P_{j + 1} = P_{j}^{l}

\begin{matrix} P_{j}^{i + 1} = {(λ^{- h} [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] - λ^{- h} [\begin{matrix} B_{2} \\ I \end{matrix}] K_{j})}^{T} \\ P_{j}^{i} (λ^{- h} [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] - λ^{- h} [\begin{matrix} B_{2} \\ I \end{matrix}] K_{j}) + Q + K_{j}^{T} R K_{j} \end{matrix}

3. $K_{j + 1} = λ^{- 2 h} {(R + λ^{- 2 h} {[\begin{matrix} B_{2} \\ I \end{matrix}]}^{T} P_{j + 1} [\begin{matrix} B_{2} \\ I \end{matrix}])}^{- 1} {[\begin{matrix} B_{2} \\ I \end{matrix}]}^{T} P_{j + 1} [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}]$ .

4. Stop if $‖ K_{j + 1} - K_{j} ‖ < ε$ ,

Otherwise set j = j+1 and return to step 2.

Remark 1. Theorem 1 explains how to design an LQ regulator for fast-rate models using the FOH element. This forms the basis for designing a reinforcement learning LQ regulator. Theorem 2 addresses the design of lifted systems, a common technique for multirate systems. The main outcome of Theorem 3 is the establishment of the M-Δ structure, which is crucial in robust control theory. Lastly, Theorem 4 deals with robust stability, demonstrating that fast-rate control systems are not more robust than dual-rate systems.

Remark 2. The matrices Q and R are weighting matrices where Q is semi-positive definite (Q ≥ 0) and R is positive definite ( $R > 0$ ). These matrices are typically chosen through a trial-and-error process. When aiming for smaller squared errors, larger values are assigned to the corresponding diagonal elements in matrix Q.

In practical applications, choosing a smaller R speeds up the closed-loop response, while a larger R slows it down. More formal methods for choosing these matrices are detailed in Reference [17, Ch. 6].

The sampling rate must be high relative to the rate of changes in the signal being considered. A common rule of thumb is to ensure the sampling rate is 5–10 times the bandwidth of the system. It’s also common practice to use an analog filter before the sampling process. The sampling rate is equal to the sampling period of fast-rate systems.

The parameter λ∈ (0,1] is crucial; a smaller λ leads to a higher speed of convergence of states.

Remark 3. This paper explores multirate (dual-rate) discrete-time systems using the FOH element for holding. The designed D/A converter achieves higher accuracy compared to ZOH. Multirate systems are important both theoretically and practically in discrete-time systems.

The first key finding, without unmodeled dynamics, shows that a dual-rate inferential system is equivalent to a single-rate (fast-rate) system. A corresponding LQ regulator with prescribed degree of stability ensures that all closed-loop poles lie within the λ-circle in the complex plane where $λ \leq 1$ . The regulator design process is offline and uses dynamic programing. For single-rate systems without unmodeled dynamics, a reinforcement learning (RL) LQ regulator is proposed, based on adaptive dynamic programing, with an online design process. The ultimate aim is to develop a model-free RL LQ controller.

The second key result addresses robust stability in the presence of unmodeled dynamics for LQ controllers. This involves transforming the dual-rate system into a single-rate system (lifted model) and establishing the $M - Δ$ model crucial in robust control theory. It demonstrates system stability for all allowable perturbations $Δ$ , ensuring $‖ M ‖_{\infty} < 1$ . A significant finding is that a fast single-rate system is no more robust than dual-rate inferential systems. This result can be extended to RL LQ control regulators, leveraging Hewer’s findings and Theorem 4.

Remark 4. The issue of complexity in prescribed performance is discussed in.⁴⁰ The application of adaptive dynamic programing for sliding-mode systems is presented in.⁴¹

Illustrative example

The selected illustrative example is a dynamic system involving the translational movement of two elastically coupled masses shown in Figure 11. The system consists of two rigid bodies with masses m₁ and m₂, which are connected by a spring with an elasticity coefficient k. The bodies move without friction along a fixed horizontal surface. A force, or control signal u, acts on the left body. This subsystem is very common in various mechatronic systems.

Figure 11.

A mechatronic subsystem as an illustrative example of a controlled plant P_c.

Let x₁ and v₁ be the position and velocity of the left rigid body, and x₂ and v₂ be the position and velocity of the right body. The state vector of the dynamic system is $x = {[x_{1} x_{2} v_{1} v_{2}]}^{T}$ , and the continuous model of the dynamic system is described by the state-space model.

\begin{matrix} \overset{\cdot}{x} = Ax + Bu, & A = [\begin{matrix} \begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} & \begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix} \\ \begin{matrix} - \frac{k}{m_{1}} & \frac{k}{m_{1}} \\ \frac{k}{m_{2}} & - \frac{k}{m_{2}} \end{matrix} & \begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix} \end{matrix}], \\ B = [\begin{matrix} \begin{matrix} 0 \\ 0 \\ \frac{1}{m_{1}} \end{matrix} \\ 0 \end{matrix}] \end{matrix}

(46)

In this illustrative example, we will adopt the following parameter values $k_{1} = k_{2} = m_{1} = m_{2} = 1$ , which will not reduce the generality of our explanations.

Sampling (46) with a sampling time h and using a FOH gives the discrete-time model:

[\begin{matrix} x ((k + 1) h) \\ u (kh) \end{matrix}] = [\begin{matrix} A & B_{1} \\ 0 & 0 \end{matrix}] [\begin{matrix} x (kh) \\ u ((k - 1) h) \end{matrix}] + [\begin{matrix} B_{2} \\ I \end{matrix}] u (kh)

where:

A = e^{A_{c} h}

B_{1} = \int_{0}^{h} (\frac{η}{h} - 1) e^{A_{c} η} B_{c} d η

B_{2} = \int_{0}^{h} (2 - \frac{η}{h}) e^{A_{c} η} B_{c} d η

For λ = 0.30, h = 1, $ε = 0.0005$ , P₀ = I, Q = [1 0 0 0 0; 0 1 0 0 0; 0 0 10 0 0; 0 0 0 10 0; 0 0 0 0 10], R = 0.8, K₀ = [0.397 0.82 0.27 0.63 0.29], the Algorithm 2 gives K = [0.98671 −0.054444 1.1925 0.89862 −0.26683], in

u (kh) = - K [\begin{matrix} x (kh) \\ u ((k - 1) h) \end{matrix}]

(47)

That is,

\begin{matrix} u (kh) = 0.99 x_{1} (kh) - 0.05 x_{2} (kh) + 1.19 x_{3} (kh) \\ + 0.9 x_{4} (kh) - 0.27 u (kh - h) \end{matrix}

The convergence efficiency of Algorithm 2 is illustrated in Figures 12 and 13.

Figure 12.

Convergence of a vector K for different λ values.

Figure 13.

Motion of the autonomous dynamic system (47 and 48), for initial conditions (x₁, x₂, x₃, x₄) = (−1, 1, 1, −1) and u = 0: Less λ (0.30) provides a faster transient process, whereby the control signal has larger amplitude changes.

From the above figures it is possible to see that for smaller degree of stability the convergence of state of dynamic system is faster.

Conclusions

This paper discusses the design of a reinforcement learning LQ regulator with a focus on two key aspects:

Many real-world systems inherently operate at multiple rates, offering benefits such as improved stability margins, simultaneous stabilization,⁴² and decentralized control⁴³

The regulator aims to minimize quadratic losses while ensuring that closed-loop poles reside within the specified region of the z-plane (0, 1]. This approach exhibits lower sensitivity to uncertainties in plant parameters compared to conventional methods, although the gain margin may vary.

The study employs a sampled-data model using FOH and introduces an LQ reinforcement learning regulator. It also addresses robustness using lifting techniques. Future research could focus on developing regulator designs that do not require knowledge of the system model, and explore aspects related to the frequency domain.⁴⁴ Also, interesting directions are the design RL LQ regulators for multivariable and continuous systems.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Saša Ćuković

Data availability statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study

References

Shah

Chen

, et al. Application of dual-rate modeling to CCR octane quality inferential control. IEEE Trans Control Syst Technol 2003; 11(1): 43–51.

Luchini

Schirrer

Jakubek

, et al. Model predictive multirate control for mixed-integer optimisation of redundant refrigeration circuits. J Process Control 2019; 76(1): 112–121.

Kranc

Input-output analysis of multirate feedback systems. IRE Trans Autom Control 1957; 3(1): 21–28.

Kalman

Bertram

JE.

A unified approach to the theory of sampling systems. J Franklin Inst 1959; 267(5): 405–436.

Friedland

Sampled-data control systems containing periodically varying members. N Proceedings on the 1^st IFAC World Conference 1961, Moscow, USSR, 361-367.

Meyer

DG.

A new class of shift-varying operators, their shift-invariant equivalents, and multirate digital systems. IEEE Trans Automat Contr 1990; 35(4): 429–433.

Sagfors

Toivonen

Lennartson

H_∞ control of multirate sampled-data systems: A state space approach. Automatica 1998; 34(4): 415–429.

Ding

Chen

Hierarchical identification of lifted state-space models for general dual-rate systems. IEEE Trans Circuits Syst 2005; 52(6): 1179–1187.

Filipovic

VZ.

Outlier robust identification of dual-rate Hammerstein models in the presence of unmodeled dynamics. Int J Robust Nonlinear Control 2022; 32(3): 1162–1179.

10.

Filipovic

Robust identification of dual-rate systems based on accelerated stochastic approximation and Bayesian information criterion. Am J Eng Res 2022; 11(3): 41–51.

11.

Joseph

. A tutorial of inferential control and its applications. In: Proceedings of the American control conference, San Diego, CA, 1999, pp. 3106–3118.

12.

Brosilow

Tong

Inferential control of processes, part II, the structure, and dynamics of inferential control systems. AIChE J 1978; 24(3): 492–500.

13.

Teng

Qiu

, et al. Filtering design for multirate sampled-data systems. IEEE Trans Syst Man Cybern Syst 2020; 50(11): 4224–4232.

14.

Shah

Chen

Identification of fast-rate models from multirate data. Int J Control 2001; 74(7): 680–689.

15.

Shah

Chen

Analysis of dual-rate inferential control systems. Automatica 2002; 38(6): 1053–1059.

16.

Yuz

Goodwin

. Sampled-data models for linear and nonlinear systems. London: Springer, 2014.

17.

Anderson

Moore

Optimal control. Linear quadratic methods. New Jersey: Prentice-Hall, 1989.

18.

Anderson

BDO

Moore

. Linear system optimisation with prescribed degree of stability. Proc IEEE 1969; 116(12): 2083–2087.

19.

Lewis

Vrabie

Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 2009; 9(3): 32–50.

20.

Bersekas

Tsitsinlis

Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.

21.

Sutton

Barto

Reinforcement learning: an introduction. Cambridge: MA I MIT Press, 2018.

22.

Bertsekas

Reinforcement and Optimal Control. Belmont, MA: Athena Scientific, 2019.

23.

Meyn

Control System and reinforcement learning. Cambridge: Cambridge University Press, 2022.

24.

Zhang

Liu

Luo

, et al. Adaptive dynamic programming for control. Algorithms and stability. Berlin: Springer, 2013.

25.

Hewer

An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Trans Automat Contr 1971; 16(4): 382–384.

26.

Astrom

Wittenmark

Computer controlled systems. New Jersey: Prentice-Hall, 1997.

27.

Granzotto

De Silva

Postoyan

, et al. Policy iteration: for want of recursive feasibility, all is not lost, arXiv.org 2022: 2210.14459v1 1:16.

28.

Modares

Naghibi Sistani

Lewis

FL.

A policy iteration approach to online optimal control of continuous-time constrained-input systems. ISA Trans 2013; 52(5): 611–621.

29.

Matni

Prouteiere

Rantzer

, et al. From self-tuning regulators to reinforcement learning and back again. In: Proc. of IEEE 58th conference on decision and control (CDC), Nice, France, 11–13 December 2019, pp. 3724–3740.

30.

Yaghmaie

Gustafsson

Ljung

Linear quadratic control using model-free reinforcement learning. IEEE Trans Automat Contr 2023; 68(2): 737–752.

31.

Mohammadi

Zare

Soltanolkotabi

, et al. Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem. IEEE Trans Automat Contr 2022; 67(5): 2435–2450.

32.

Vrabie

Vamvoudakis

Lewis

FL.

Optimal adaptive control and differential games by reinforcement learning principles. London: The Institution of Engineering and Technology, 2013.

33.

Lewis

Fan

Reinforcement learning: optimal feedback control with industrial applications. Berlin: Springer, 2023.

34.

Dimitrakis

Ortner

Decision making under uncertainty and reinforcement learning: theory and algorithms. Berlin: Springer, 2022.

35.

Yan

Zhang

Sun

, et al. Sliding mode control based on reinforcement learning for T-S fuzzy fractional-order multiagent system with time-varying delays. IEEE Trans Neural Netw Learn Syst 2024; 35(8): 10368–10379.

36.

Vidyasagar

A tutorial introduction to reinforcement learning. SICE J Control Meas Syst Integr 2023; 16(1): 172–191.

37.

Wang

Liang

, et al. Deep reinforcement learning: a survey. IEEE Trans Neural Netw Learn Syst 2024; 35(4): 5064–5078.

38.

Chen

Francis

Optimal sampled-data control systems. Berlin: Springer, 1995.

39.

Zhou

Doyle

Glover

Robust and optimal control. New Jersey: Prentice-Hall, 1996.

40.

Huang

Niu

Wang

, et al. Prescribed performance-based low-complexity adaptive 2-bit-triggered control for unknown nonlinear systems with actuator dead-zone. IEEE Trans Circuits Syst II Express Briefs 2024; 71(2): 762–766.

41.

Liu

Wang

Liu

, et al. Sliding-mode surface-based adaptive optimal nonzero-sum games for saturated nonlinear multi-player systems with identifier-critic networks. Neurocomputing 2024; 584: 127575.

42.

Khargonekar

Poolla

Tannenbaum

Robust control of linear time-invariant plants using periodic compensation. IEEE Trans Automat Contr 1985; 30(11): 1088–1096.

43.

Anderson

Moore

Time-varying feedback laws for decentralized control. IEEE Trans Automat Contr 1981; 26(5): 1133–1139.

44.

Haren

Blanken

Oomen

. Frequency domain identification of multirate systems: A lifted local polynomial modeling approach. In: Proc. of IEEE 61st conference on decision and control, Cancun, Mexico, 2022, pp. 2795–2800.

Design of LQ regulators with prescribed degree of stability for dual-rate systems based on reinforcement learning

Abstract

Keywords

Introduction

Problem formulation

Fast model of the plant based on FOH method

LQ with prescribed degree of stability for fast discrete time model

Robust stability of dual-rate systems

Model-based policy iteration algorithm for LQ regulator design

Illustrative example

Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

Data availability statement

References