Model-free optimal decentralized sliding mode control for modular and reconfigurable robots based on adaptive dynamic programming

Abstract

In this article, a model-free decentralized sliding mode control method is proposed based on adaptive dynamic programming algorithm to solve the problem of optimal trajectory tracking control of modular and reconfigurable robots. The dynamic model of modular and reconfigurable robot is formulated by a synthesis of joint subsystems with interconnected dynamic couplings. Based on sliding mode control technique, the optimal control problem of the modular and reconfigurable robot systems is transformed into an optimal compensation issue of unknown dynamics of each joint subsystems, in which the interconnected dynamic couplings effects among the subsystems are approximated by using the developed neural network identifier. Based on policy iteration scheme and the adaptive dynamic programming algorithm, the Hamilton–Jacobi–Bellman equation can be solved by using the critic neural network, so that optimal control policy can be obtained. The closed-loop system is proved to be asymptotically stable by using the Lyapunov theory. Finally, simulation results are provided to demonstrate the effectiveness of the method.

Keywords

Decentralized control optimal control modular and reconfigurable robots adaptive dynamic programming

Introduction

Modular and reconfigurable robots (MRRs)¹ have attracted wide attention in robotics community since they are possessed of better structural adaptability and flexibility than conventional robots. Until now, MRRs have wide applications in many extreme occasions, such as space explorations, disaster assistance, hazard survey, and medical assistance. Correspondingly, appropriate control systems are required to guarantee the accuracy and efficiency of MRRs in the face of different tasks.

As a useful tool to deal with disturbances, sliding mode control (SMC) technique may effectively improve the robustness of the nonlinear systems. There are some insightful papers that address the stabilization and tracking control problems by using SMC technique. Saleh and Fairouz² investigated a robust adaptive second-order SMC method for tracking problem of a class of uncertain linear systems with matched and unmatched disturbances. Donya and Saleh³ proposed an adaptive super-twisting decoupled terminal SMC technique for a class of fourth-order systems. Saleh and Fairouz⁴ offered an adaptive global second-order sliding surface for perturbed dynamical systems with matched and unmatched external disturbances. Moreover, SMC has been widely used to design the controllers of manipulators. A sliding mode robust control method was presented for the pan-tilt joint manipulator.⁵ A sliding mode adaptive neural network (NN) control was presented for the nonholonomic mobile manipulator.⁶ Some investigations have presented to address the problems of manipulators,^7–9 and these methods are further implemented for controlling MRR systems. A stable adaptive fuzzy SMC method was proposed for an MRR to satisfy modular software.¹⁰ Slotine and Sastry¹¹ applied SMC technique on 2-degree-of-freedom (DOF) rigid MRRs to deal with the problem of tracking time-varying reference trajectories. Ficola and Cava¹² presented an SMC method with two sliding surfaces, which gave the application on controlling two-joint MRR. However, the above-mentioned methods have not considered the problem of improving the efficiency of MRRs. Indeed, MRRs are always utilized in extreme occasions without external power supply; therefore, an ideal controller for MRRs should be possessed of the properties that guarantee the robustness of robotic systems and simultaneously consider the optimality of the composite of output power and error characteristics.

Optimal control, which was developed about five decades ago by Bellman¹³ and Pontryagin,¹⁴ and hitherto many practical and theoretical results have been represented.^15–18 For the perspectives of mathematics, optimal control problem is addressed by minimizing the desired performance index, and then, the solution can be acquired by addressing Hamilton–Jacobi–Bellman (HJB) equation approximatively as it gives the sufficient conditions for the optimality. For addressing the tracking problem of wheeled mobile robot, Lewis and Syrmos¹⁸ investigated a near tracking control method based on receding-horizon dual-heuristic programming algorithm. Based on reinforcement learning theory, Bhsin et al.¹⁹ addressed the optimal coordination control problem of multiple robots when dealing with targets with desired trajectories. Nageshrao et al.²⁰ proposed an optimal passive control for the 2-DOF manipulator by using the energy balance theory. Tang et al.²¹ proposed a learning-based adaptive optimal control method, which is used to solve the tracking problem of n-link robots. The mentioned methods above are all belong to the centralized control scheme. Indeed, an important property of MRRs is that their modules can be replaced, removed, appended, and optionally without adjusting control parameters of others, so that there exists physical restrictions on information interchange among the joint modules of the robotic system. Unfortunately, this kind of restrictions makes it impossible to adopt a centralized control method for MRR systems. To deal with the drawbacks of the conventional centralized control schemes, Li et al.²² presented a decentralized robust control method for MRRs based on self-tuned feedback gain. In our previous works, we also paid attention to investigate decentralized robust control,²³ decentralized trajectory tracking control,²⁴ decentralized fault-tolerant control,²⁵ and decentralized force/position control²⁶ for MRR systems. However, the mentioned control methods for MRRs are not considered the optimal implementation of the controllers, which can guarantee the stability of robotic systems and simultaneously ensure the optimality of composite of error characteristics and output energy efficiency. Some researchers are presented by combining adaptive dynamic programming (ADP)-based optimal method with decentralized control scheme. Based on ADP theory, Bian et al.²⁷ presented a decentralized optimal controller based on unmatched uncertainties. Zhao et al.²⁸ proposed a proportional–integral (PI) algorithm–based decentralized scheme for large systems with mismatched interconnections. Dong et al.²⁹ addressed the optimal control problems of MRRs by combining the model-based compensation control with ADP-based learning control, and their researches are further expanded to deal with the optimal tracking control issues of MRRs with uncertain environments.³⁰ The ADP-based decentralized methods are proposed to solve the stabilization control problem of complex robot manipulators and nonlinear systems. However, to the best of authors’ knowledge, there are few researches concentrated on dealing with model-free optimal decentralized SMC of manipulator systems, especially for MRR systems.

In this article, a model-free decentralized SMC method is presented for MRR system via PI scheme and ADP methods. First, the dynamic formulation of MRRs is composed as a synthesis of joint subsystems with interconnected dynamic coupling (IDC) effects. Then, based on SMC technique, the optimal control problem has been transformed into an optimal compensation issue of subsystem unknown dynamics. A decentralized control strategy is designed, which uses only local dynamic information of each joint module, in which subsystem dynamic model is completely unknown. Based on the ADP method and the PI algorithm, the HJB equation is solved by using a critic NN, and then, the optimal control policy can be derived. According to the Lyapunov theory, the closed-loop MRR systems are proved to be asymptotically stable. Finally, simulations are represented to verify the advantage and effectiveness of the proposed method.

The main contributions of this article can be summarized as follows:

To the best of our knowledge, it is the first time to extend the ADP approach to address the model-free decentralized optimal control problem of MRR systems. The proposed scheme can be utilized to MRRs with different configurations and different environment without changing control parameters. Unlike the conventional ADP methods that use action NN and critic NN, in this research, the optimal control policy is obtained by using only critic NN, and the training of action NN is no longer needed. It infers that the computational burden can be reduced effectively.

Unlike the existing methods that consider the IDC effects, a kind of system disturbance with known upper bounds, in this article, the IDC effects, which are with larger order of magnitudes than some other system dynamics, are addressed independently and based on target by using the developed NN identifier-based compensation control law.

Problem statement

The dynamic formulation of robot system with n-DOF can be formulated as follows

M (q) \overset{\cdot\cdot}{q} + C (q, \overset{\cdot}{q}) \overset{\cdot}{q} + G (q) = u

(1)

where $u \in R^{n}$ represents joint torque, $q \in R^{n}$ means vector of joint positions, $M (q) \in R^{n \times n}$ indicates inertia matrix, $C (q, \overset{\cdot}{q}) \in R^{n \times n}$ denotes centripetal force and Coriolis term, and $G (q) \in R^{n}$ represents gravity term.

In the practical application, such as space exploration or disaster rescue, MRR consists of many joint modules, which brings the problem of heavy computational burden and complex control structure. In order to address this drawback, we consider each robotic joint as a subsystem of whole MRR system, which contains IDCs among each subsystem. The ith subsystem dynamic model is expressed as

\begin{matrix} M_{i} (q_{i}) {\overset{\cdot\cdot}{q}}_{i} + C_{i} (q_{i}, {\overset{\cdot}{q}}_{i}) {\overset{\cdot}{q}}_{i} + G_{i} (q_{i}) + I_{i} (q, \overset{\cdot}{q}, \overset{\cdot\cdot}{q}) = u_{i} \\ I_{i} (q, \overset{\cdot}{q}, \overset{\cdot\cdot}{q}) = {\sum_{j = 1, j \neq i}^{n} [M_{ii} (q) - M_{i} (q_{i})] {\overset{\cdot\cdot}{q}}_{i} + M_{ij} (q) {\overset{\cdot\cdot}{q}}_{j}} \\ + {\sum_{j = 1, j \neq i}^{n} [- C_{i} (q_{i}, {\overset{\cdot}{q}}_{i}) + C_{ii} (q, \overset{\cdot}{q})] {\overset{\cdot}{q}}_{i} + C_{ij} (q, \overset{\cdot}{q}) {\overset{\cdot}{q}}_{j}} \\ + {\bar{G}}_{i} (q) - G_{i} (q_{i}) \end{matrix}

(2)

where $q_{i}, {\overset{\cdot}{q}}_{i}, {\overset{\cdot\cdot}{q}}_{i}, u_{i}, {\bar{G}}_{i}$ are the ith elements of the vectors $q, \overset{\cdot}{q}, \overset{\cdot\cdot}{q}, u, G$ . $M_{ij}$ and $C_{ij}$ are the ijth element of the matrices $M$ and $C$ , respectively. $I_{i}$ is the IDC term.

Let $x_{i} = [x_{i 1}, x_{i 2}]^{T} = [q_{i}, {\overset{\cdot}{q}}_{i}]^{T}$ , the dynamic model can be formulated as the state space (3)

{\begin{matrix} {\overset{\cdot}{x}}_{i 1} = x_{i 2} \\ {\overset{\cdot}{x}}_{i 2} = f_{i} (x_{i}) + g_{i} (x_{i}) u_{i} + ψ_{i} (x) \\ y_{i} = x_{i} \end{matrix}

(3)

where $x_{i}$ is the state of the ith subsystem

\begin{matrix} f_{i} (x_{i}) = M_{i}^{- 1} (q_{i}) [- C_{i} (q_{i}, {\overset{\cdot}{q}}_{i}) {\overset{\cdot}{q}}_{i} - G_{i} (q_{i})] \\ g_{i} (x_{i}) = M_{i}^{- 1} (q_{i}), ψ_{i} (x) = - M_{i}^{- 1} (q_{i}) I_{i} (q, \overset{\cdot}{q}, \overset{\cdot\cdot}{q}) \end{matrix}

where $ψ_{i} (x)$ is the IDC term and $x$ is the state vector with regard to the whole MRRs.

Assumption 1

The desired position $q_{id}$ , velocity ${\overset{\cdot}{q}}_{id}$ , and acceleration ${\overset{\cdot\cdot}{q}}_{id}$ are bounded.

For getting rid of the norm-boundedness assumption of IDC, desired states of coupled subsystems are used to instead of actual ones. IDC term can be written as

ψ_{i} (x) = Δ ψ_{i} (x, x_{jd}) + ψ_{i} (x_{i}, x_{jd})

where $x_{jd}$ denotes the desired states of the coupled subsystems with $j = 1, \dots, N (j \neq i), ψ_{i} (x_{i}, x_{jd})$ representing certain part of IDC, where $x_{jd}$ is used instead of actual one $x_{j}$ , and $Δ ψ_{i} (x, x_{jd})$ represents the substitution error. Equation (3) can be rewritten as follows

{\begin{matrix} {\overset{\cdot}{x}}_{i 1} = x_{i 2} \\ {\overset{\cdot}{x}}_{i 2} = F_{i} (x_{i}, x_{jd}) + g_{i} (x_{i}) u_{i} (x_{i}) + Δ ψ_{i} (x, x_{jd}) \\ y_{i} = x_{i} \end{matrix}

(4)

where $F_{i} (x_{i}, x_{jd}) = ψ_{i} (x_{i}, x_{jd}) + f_{i} (x_{i})$ .

Remark 1

The system dynamic terms $ψ_{i} (x), F_{i} (x_{i}, x_{jd}), g_{i} (x_{i})$ are always unknown, since the configuration of MRRs may change according to different tasks. This is the reason why we have to design the model-free controller for MRRs. Besides, an appropriate model-free controller should be designed with the property that fit for different configurations without adjusting control parameters.

Accordingly, in the following section, a model-free decentralized optimal SMC method is presented for MRRs to ensure that the closed-loop systems are asymptotically stable.

Model-free decentralized optimal SMC based on ADP

Derivation of the optimal SMC scheme

Define the joint position tracking error as

e_{i} = x_{i 1} - x_{i 1 d}

(5)

Then, the time derivative of equation (5) can be obtained as

{\overset{\cdot}{e}}_{i} = {\overset{\cdot}{x}}_{i 1} - {\overset{\cdot}{x}}_{i 1 d}

(6)

According to the frame of the SMC method, we can define the sliding surface as follows

s_{i} (t) = {\overset{\cdot}{e}}_{i} (t) + k_{i} e_{i} (t)

(7)

where $k_{i} > 0$ denotes the determined constant. Then, the time derivative of equation (7) is given as

{\overset{\cdot}{s}}_{i} (t) = {\overset{\cdot\cdot}{e}}_{i} (t) + k_{i} {\overset{\cdot}{e}}_{i} (t) = F_{i} + g_{i} u_{i} - {\overset{\cdot\cdot}{x}}_{i 1 d} + k_{i} {\overset{\cdot}{e}}_{i} + Δ ψ_{i}

(8)

According to equations (7) and (8), the objective of the SMC is to satisfy the relation $s_{i} (t) = {\overset{\cdot}{s}}_{i} (t) = 0$ . Moreover, for MRR systems, an ideal sliding mode controller should not only be possessed of the properties that guarantee the convergence of error performance in the sliding surface but also consider the optimality of the composite of error characteristics and output power simultaneously.

Therefore, the control objective can be transformed to design the optimal SMC law $u_{i}^{*}$ . Accordingly, one can design the following performance index function

\underset{u_{i} \in Ψ_{i} (Φ_{i})}{J_{i} (s_{i})} = \int_{0}^{\infty} (Λ_{i} {(\nabla J_{i}^{*} (s_{i}))}^{2} + H_{i}^{T} H_{i} + U_{i} (s_{i}, u_{i})) dt

(9)

where $U_{i} (s_{i}, u_{i} (s_{i})) = s_{i}^{T} Q_{i} s_{i} + u_{i}^{T} R_{i} u_{i}$ is the utility function and $U_{i} (0) = 0, U_{i} (s_{i}, u_{i} (s_{i})) \geq 0$ . $Q_{i}, R_{i}$ are the positive defined matrices, $H_{i}$ indicates the known up-bound function for $Δ ψ_{i}$ and one obtains $Δ ψ_{i}^{T} Δ ψ_{i} \leq H_{i}^{T} H_{i}$ , and $Λ_{i}$ is a positive constant.

It is noted that the decentralized control $u_{i}$ should not only stabilize the robot system but also guarantee that equation (9) is finite, that is, the control policy must be admissible. Here, we introduce the definition of admissible control.

Definition 1

In equation (4), decentralized control $u_{i} (s_{i})$ is called admissible for the cost function (9) on the compact set $Φ_{i}$ , and $J_{i} (s_{i})$ is finite, $\forall s_{i} \in Φ_{i}$ . Give an admissible control policy $u_{i} \in Ξ_{i} (Φ_{i})$ , where $u_{i} \in Ξ_{i} (Φ_{i})$ denotes admissible control set, the infinite performance index function (9) is a Lyapunov equation

\begin{matrix} 0 = H_{i}^{T} H_{i} + Λ_{i} {(\nabla J_{i}^{*} (s_{i}))}^{2} + U_{i} (s_{i}, u_{i} (s_{i})) + {(\nabla J_{i} (s_{i}))}^{T} \\ \cdot (Δ ψ_{i} + g_{i} (x_{i}) u_{i} (x_{i}) + F_{i} (x_{i}, x_{jd}) - {\overset{\cdot\cdot}{x}}_{i 1 d} + k_{i} {\overset{\cdot}{e}}_{i} (t)) \end{matrix}

(10)

where $J_{i}^{*} (s_{i})$ is the optimal performance index function and $\nabla J_{i} (s_{i}) = \partial J_{i} (s_{i}) / \partial (s_{i}), J_{i} (0) = 0$ .

Define the Hamilton function and the optimal performance index function as follows

\begin{matrix} H_{i} (s_{i}, u_{i} (s_{i}), \nabla J_{i} (s_{i})) = H_{i}^{T} H_{i} + s_{i}^{T} Q_{i} s_{i} + {u_{i}}^{T} R_{i} u_{i} \\ + {(\nabla J_{i} (s_{i}))}^{T} (F_{i} (x_{i}, x_{jd}) + g_{i} (x_{i}) u_{i} - {\overset{\cdot\cdot}{x}}_{i 1 d} + k_{i} {\overset{\cdot}{e}}_{i} + Δ ψ_{i}) \\ + Λ_{i} {(\nabla J_{i}^{*} (s_{i}))}^{2} \end{matrix}

(11)

J_{i}^{*} (s_{i}) = min_{u_{i} \in (s_{i})} \int_{0}^{\infty} (\begin{matrix} U_{i} (s_{i}, u_{i} (s_{i})) + H_{i}^{T} H_{i} \\ + Λ_{i} {(\nabla J_{i}^{*} (s_{i}))}^{2} \end{matrix}) dt

(12)

Under the framework of optimal control design, one obtains that $J_{i}^{*} (s_{i})$ satisfies the following HJB equation

0 = min_{u_{i} \in (s_{i})} H_{i} (s_{i}, u_{i} (s_{i}), \nabla {J_{i}}^{*} (s_{i}))

(13)

If the ${J_{i}}^{*} (s_{i})$ is continuously differentiable, the optimal SMC can be formulated as

u_{i}^{*} = - \frac{1}{2} R_{i}^{- 1} g_{i}^{T} (x_{i}) \nabla J_{i}^{*} (s_{i})

(14)

Rewriting $u_{i}^{*}$ as the form of $u_{i}^{*} = u_{i 1} + u_{i 2}^{*}$ , $u_{i 1}$ is used to compensate the IDC and $u_{i 2}^{*}$ is proposed to address the optimal compensation problem of unknown dynamics of the ith subsystem.

Combining equations (13) and (10), we get

\begin{matrix} {(\nabla J_{i} (s_{i}))}^{T} (F_{i} (x_{i}, x_{jd}) + Δ ψ_{i} - {\overset{\cdot\cdot}{x}}_{i 1 d} + k_{i} {\overset{\cdot}{e}}_{i} + g_{i} (x_{i}) u_{i}) \\ = - s_{i}^{T} Q_{i} s_{i} - {(u_{i 1} + u_{i 2}^{*})}^{T} R_{i} (u_{i 1} + u_{i 2}^{*}) - Λ_{i} {(\nabla J_{i}^{*} (s_{i}))}^{2} \\ - H_{i}^{T} H_{i} \end{matrix}

(15)

Next, an identifier-based controller is used to compensate IDCs.

Identification of the IDC

In this section, an identifier is presented to approximate the term $Δ ψ_{i} (x, x_{jd})$ .

Assumption 2³¹

NN activation function $σ (\cdot)$ and its derivative $σ' (\cdot)$ are bounded.

Assumption 3³²

The NN approximation error has been upper bounded by the unknown constant.

According to the above assumptions, $Δ ψ_{i}$ can be denoted by the single-layer NN, which is given as

Δ ψ_{i} = w_{ih}^{T} σ_{ih} (x_{ih}, x_{D}) + ε_{ih} (x_{ih})

(16)

where $σ_{ih} (x_{ih}, x_{D})$ represents NN activation function, $x_{ih}$ indicates the determined NN state, $w_{ih}$ is an unknown ideal NN weight, $x_{D} = [x_{1 d}, x_{2 d}, \dots, x_{md}]^{T}, m < i$ means the reference state vector which are known and bounded, and $ε_{ih}$ means the NN approximation error. Considering nonlinear dynamic system, the bounded input $u_{ih}$ is represented as

{\overset{\cdot}{x}}_{ih} = Δ ψ_{i} + u_{ih} = w_{ih}^{T} σ_{ih} (x_{ih}, x_{D}) + ε_{ih} (x_{ih}) + u_{ih}

(17)

The NN identifier, which is used to approximate equation (17), is as follows

{\overset{\cdot}{\hat{x}}}_{ih} = u_{ih} + Δ {\hat{ψ}}_{i} = {\hat{w}}_{ih}^{T} σ_{ih} + ε_{ih} + u_{ih} + r_{ih}

(18)

where ${\hat{x}}_{ih}$ is the identification state of $x_{ih}$ , $Δ {\hat{ψ}}_{i}$ is the approximation of $Δ ψ_{i}$ , ${\hat{w}}_{ih}$ represents the weight estimate, and $r_{ih}$ represents an integral of the sign of the error feedback term

r_{ih} = k_{ih} e_{ih} + v_{ih}

(19)

where $e_{ih} = - {\hat{x}}_{ih} + x_{ih}$ represents the identification error. $ζ_{ih}$ means the generalized solution to

{\overset{\cdot}{ζ}}_{ih} = (k_{ih} α_{ih} + γ_{ih}) e_{ih} + δ_{i 1} sgn (e_{ih})

(20)

where $α_{ih}, k_{ih}, δ_{i 1}, γ_{ih}$ indicate determined positive constants and $sgn (•)$ means the signal function. Combining equation (17) with equation (18), one can obtain the identification error dynamics as follows

{\overset{\cdot}{e}}_{ih} = Δ {\tilde{ψ}}_{i} = w_{ih}^{T} σ_{ih} - {\hat{w}}_{ih}^{T} σ_{ih} + ε_{ih} - r_{ih}

(21)

where $Δ {\tilde{ψ}}_{i} = Δ ψ_{i} - Δ {\hat{ψ}}_{i}$ . The identification error function can be defined as follows

ρ_{ih} = {\overset{\cdot}{e}}_{ih} + α_{ih} e_{ih}

(22)

By using equations (21) and (22), one obtains that

\begin{matrix} {\overset{\cdot}{ρ}}_{ih} = & w_{ih}^{T} σ'_{ih} {\overset{\cdot}{x}}_{ih} - {\overset{\cdot}{\hat{w}}}_{ih}^{T} σ_{ih} - {\hat{w}}_{ih}^{T} σ'_{ih} {\overset{\cdot}{\hat{x}}}_{ih} + {\overset{\cdot}{ε}}_{ih} \\ - k_{ih} s_{ih} - γ_{ih} e_{ih} - δ_{i 1} sgn (e_{ih}) + α_{ih} {\overset{\cdot}{e}}_{ih} \end{matrix}

(23)

where the weight update law of equation (24) is given as follows

{\overset{\cdot}{\hat{w}}}_{ih} = proj (Γ_{ih} σ'_{ih} {\overset{\cdot}{\hat{x}}}_{ih} {\tilde{x}}_{ih}^{T})

(24)

where $proj (•)$ is the projection operation and $Γ_{ih} > 0$ represents a constant matric. Then, equation (23) can be written as

\begin{matrix} {\overset{\cdot}{ρ}}_{ih} = & {\tilde{P}}_{ih 1} (w_{ih}, {\hat{w}}_{ih}, e_{ih}) + P_{ih 2} (w_{ih}, {\hat{w}}_{ih}, x_{ih}) \\ + P_{ih 3} ({\hat{x}}_{ih}, {\tilde{w}}_{ih}) - γ_{ih} e_{ih} - δ_{i 1} sgn (e_{ih}) - k_{ih} s_{ih} \end{matrix}

(25)

where

\begin{matrix} {\tilde{P}}_{ih 1} = α_{ih} {\overset{\cdot}{e}}_{ih} - {\overset{\cdot}{\hat{w}}}_{ih}^{T} σ_{ih} + \frac{1}{2} w_{ih}^{T} σ'_{ih} {\overset{\cdot}{e}}_{ih} + \frac{1}{2} {\hat{w}}^{T}_{ih} σ'_{ih} {\overset{\cdot}{e}}_{ih} \\ P_{ih 2} = \frac{1}{2} w_{ih}^{T} σ'_{ih} {\overset{\cdot}{x}}_{ih} - \frac{1}{2} {\hat{w}}^{T}_{ih} σ'_{ih} {\overset{\cdot}{x}}_{ih} + {\overset{\cdot}{ε}}_{ih} \\ P_{ih 3} = \frac{1}{2} {\tilde{w}}_{ih}^{T} σ'_{ih} {\overset{\cdot}{\hat{x}}}_{ih} \end{matrix}

(26)

in which ${\tilde{w}}_{ih} = w_{ih} - {\hat{w}}_{ih}$ . Then, one can obtain the auxiliary term $P_{ih 3} (x_{ih}, {\tilde{w}}_{ih})$ instead of ${\overset{\cdot}{\hat{x}}}_{ih}$ in ${\hat{P}}_{ih 3}$ by ${\overset{\cdot}{x}}_{ih}$ to facilitate stability analysis, and we obtain ${\tilde{P}}_{ih 3} = {\hat{P}}_{ih 3} - P_{ih 3}$ and denote $P_{ih} = P_{ih 2} + P_{ih 3}$ . According to Assumptions 2 and 3 and using equations (22), (24), and (26), the following bounds can be obtained

\begin{matrix} ‖ {\tilde{P}}_{ih 1} ‖ \leq μ_{i 1} (‖ b_{i} ‖) ‖ b_{i} ‖, ‖ P_{ih 2} ‖ \leq ξ_{i 1}, ‖ P_{ih 3} ‖ \leq ξ_{i 2} \\ ‖ P_{ih} ‖ \leq ξ_{i 3} + ξ_{i 4} μ_{i 2} (‖ b_{i} ‖) ‖ b_{i} ‖ \\ ‖ {\overset{\cdot}{e}}_{ih}^{T} P_{ih 3} ‖ \leq ξ_{i 5} {‖ e_{ih} ‖}^{2} + ξ_{i 6} {‖ s_{ih} ‖}^{2} \end{matrix}

(27)

where $b_{i} = {[\begin{matrix} e_{ih}^{T} & ρ_{ih}^{T} \end{matrix}]}^{T}, μ_{i 1} (\cdot), μ_{i 2} (\cdot)$ are positive, non-decreasing, invertible functions. $ξ_{im} > 0, m = 1, \dots, 6$ are constants. In order to analyze stability, denote

Q_{ih} = {[\begin{matrix} e_{ih}^{T} & ρ_{ih}^{T} & \sqrt{M_{i}} & \sqrt{N_{i}} \end{matrix}]}^{T}

(28)

where $M_{i}$ represents the generalized solution to the differential equation

\begin{matrix} {\overset{\cdot}{M}}_{i} = & - ρ_{ih}^{T} (P_{ih 2} - δ_{i 1} sgn (e_{ih})) + {\overset{\cdot}{e}}_{ih}^{T} P_{ih 3} \\ - δ_{i 2} μ_{i 2} (‖ b_{i} ‖) ‖ b_{i} ‖ ‖ e_{ih} ‖ \end{matrix}

(29)

where $δ_{i 1}, δ_{i 2}$ are chosen by following conditions to guarantee that $M_{i} \geq 0$

δ_{i 1} > max (ξ_{i 2} + ξ_{i 1}, ξ_{i 1} + \frac{ξ_{i 3}}{α_{ih}}), δ_{i 2} > ξ_{i 4}

(30)

The auxiliary function $N_{i}$ can be defined as

N_{i} = \frac{1}{4} α_{ih} [tr ({\tilde{w}}_{ih}^{T} Γ_{ih}^{- 1} {\tilde{w}}_{ih})]

(31)

in which $tr (•)$ indicates the trace of matrix.

Theorem 1

The IDC term is indicated in equation (16) and the dynamic system is developed in equation (17). The utilization of the NN identifier presented in equation (18) and the weight update law given in equation (24) can guarantee the asymptotic identification of the IDCs in the sense that

lim_{t \to \infty} ‖ Δ {\tilde{ψ}}_{i} ‖ = 0

(32)

provided $k_{ih}, γ_{ih}$ are satisfied by the following conditions

k_{ih} > ξ_{i 6}, γ_{ih} > \frac{ξ_{i 5}}{α_{ih}}

(33)

$where k_{ih}, γ_{ih}, α_{ih}, ξ_{i 5}, ξ_{i 6}$ are in equations (20) and (27).

Proof

Define the Lyapunov function candidate $V_{ih} (Q_{ih}) : D_{ih} \to R_{ih}$ as follows

V_{ih} (Q_{ih}) = \frac{1}{2} ρ_{ih}^{T} ρ_{ih} + \frac{1}{2} γ_{ih} e_{ih}^{T} e_{ih} + M_{i} + N_{i}

(34)

which satisfies the following relations

U_{1} (Q_{ih}) \leq V_{ih} (Q_{ih}) \leq U_{2} (Q_{ih})

(35)

where $U_{1} (Q_{ih}), U_{2} (Q_{ih})$ are continuous positive definite functions

\begin{matrix} U_{1} (Q_{ih}) = \frac{1}{2} min (γ_{ih}, 1) {‖ Q_{ih} ‖}^{2} \\ U_{2} (Q_{ih}) = max (γ_{ih}, 1) {‖ Q_{ih} ‖}^{2} \end{matrix}

(36)

Define the time derivative of equation (34) as follows

\begin{matrix} {\overset{\cdot}{V}}_{ih} & = \nabla V_{ih}^{T} {[\begin{matrix} {\overset{\cdot}{ρ}}_{ih}^{T} & {\overset{\cdot}{e}}_{ih}^{T} & \frac{1}{2} M_{i}^{- \frac{1}{2}} {\overset{\cdot}{M}}_{i} & \frac{1}{2} N_{i}^{- \frac{1}{2}} {\overset{\cdot}{N}}_{i} \end{matrix}]}^{T} \\ = [\begin{matrix} ρ_{ih}^{T} & γ_{ih} e_{ih}^{T} & 2 M_{i}^{\frac{1}{2}} & 2 N_{i}^{\frac{1}{2}} \end{matrix}] \\ {[\begin{matrix} {\overset{\cdot}{ρ}}_{ih}^{T} & {\overset{\cdot}{e}}_{ih}^{T} & \frac{1}{2} M_{i}^{- \frac{1}{2}} {\overset{\cdot}{M}}_{i} & \frac{1}{2} N_{i}^{- \frac{1}{2}} {\overset{\cdot}{N}}_{i} \end{matrix}]}^{T} \\ < ({\tilde{P}}_{ih 1} + P_{ih 2} + {\hat{P}}_{ih 3} - k_{ih} ρ_{ih} - δ_{i 1} sgn (e_{ih}) - γ_{ih} e_{ih}) \\ ρ_{ih}^{T} + γ_{ih} e_{ih}^{T} (ρ_{ih} - α_{ih} e_{ih}) - ρ_{ih}^{T} (P_{ih 2} - δ_{i 1} sgn (e_{ih})) - \\ {\overset{\cdot}{e}}_{ih}^{T} P_{ih 3} + δ_{i 2} μ_{i 2} (‖ b_{i} ‖) ‖ b_{i} ‖ ‖ e_{ih} ‖ - \frac{1}{2} α_{ih} [tr ({\tilde{w}}_{ih}^{T} Γ_{ih}^{- 1} {\overset{\cdot}{\hat{w}}}_{ih})] \\ = - α_{ih} γ_{ih} e_{ih}^{T} e_{ih} - k_{ih} ρ_{ih}^{T} ρ_{ih} + \frac{1}{2} α_{ih} e_{ih}^{T} {\tilde{w}}_{ih}^{T} σ_{ih}^{'} {\overset{\cdot}{\hat{x}}}_{ih} \\ + δ_{i 2} μ_{i 2} (‖ b_{i} ‖) ‖ b_{i} ‖ ‖ e_{ih} ‖ - \frac{1}{2} α_{ih} [tr ({\tilde{w}}_{ih}^{T} σ_{ih}^{'} {\overset{\cdot}{\hat{x}}}_{ih} e_{ih}^{T})] \\ + {\overset{\cdot}{e}}_{ih}^{T} ({\hat{P}}_{ih 3} - P_{ih 3}) + \frac{1}{2} α_{ih} e_{ih}^{T} {\hat{w}}_{ih}^{T} σ_{ih}^{'} {\overset{\cdot}{\hat{x}}}_{ih} + ρ_{ih}^{T} {\tilde{P}}_{ih 1} \end{matrix}

(37)

Canceling the common terms in equation (37), denoted as $k_{ih} = k_{ih 1} + k_{ih 2}, γ_{ih} = γ_{ih 1} + γ_{ih 2}$ , using equation (27), and completing the squares, one can obtain the upper bound of equation (37) as follows

\begin{matrix} {\overset{\cdot}{V}}_{ih} \leq - (α_{ih} γ_{ih 1} - ξ_{i 5}) {‖ ρ_{ih} ‖}^{2} - (k_{i 1} - ξ_{i 6}) {‖ ρ_{ih} ‖}^{2} \\ + \frac{μ_{i 1} {(‖ b_{i} ‖)}^{2}}{4 λ_{i 2}} {‖ b_{i} ‖}^{2} + \frac{δ_{i 2}^{2} μ_{i 2} {(‖ b_{i} ‖)}^{2}}{4 α_{ih} γ_{ih 2}} {‖ b_{i} ‖}^{2} \end{matrix}

(38)

If equation (33) is satisfied, equation (38) can be written as

\begin{matrix} {\overset{\cdot}{V}}_{ih} \leq - λ_{i 1} ‖ b_{i} ‖^{2} + \frac{μ {(‖ b_{i} ‖)}^{2}}{4 λ_{i 2}} ‖ b_{i} ‖^{2} \leq - U (Q_{ih}) \\ \forall Q_{i} \in D_{ih} \end{matrix}

(39)

where $U (Q_{ih}) = α_{ic} ‖ b_{i} ‖^{2}$ , $λ_{i 1} = min {k_{i 1} - ξ_{i 6}, α_{ih} γ_{ih 1} - ξ_{i 5}}$ , and $λ_{i 2} = min {k_{i 2}, (α_{ih} γ_{ih 2} / δ_{i 2}^{2})}$ , in which a constant $α_{ic} > 0$ is the positive semidefinite function defined as $D_{ih} = {Q_{ih} | ‖ Q_{ih} ‖ \leq μ_{i}^{- 1} (2 \sqrt{λ_{i 1} λ_{i 2}})}$ .

Let $R_{ih} \subset D_{ih}$ denote a set defined as

R_{ih} = {Q_{ih} \subset D_{ih} | U_{2} (Q_{ih}) < \frac{1}{2} {(μ_{i}^{- 1} (2 \sqrt{λ_{i 1} λ_{i 2}}))}^{2}}

(40)

The region of attraction in equation (40) can be adjusted arbitrarily big to contain any initial conditions. Then, we can get that

α_{ic} ‖ b_{i} ‖^{2} \to 0

(41)

while $\forall Q_{ih} (0) \in R_{ih}, t \to \infty$ .

As per the definition of $b_{i}, ρ_{ih}, Δ ψ_{i}$ , one concludes that $‖ e_{ih} ‖, ‖ ρ_{ih} ‖, ‖ {\overset{\cdot}{e}}_{ih} ‖ \to 0$ , while $\forall Q_{ih} (0) \in R_{ih}, t \to \infty$ . According to equations (21) and (32), and using the conclusion above, we can get $lim_{t \to \infty} ‖ Δ {\tilde{ψ}}_{i} ‖ = 0$ . The proof has been completed.

According to equations (18), (19), and (20), we can design $u_{i 1}$ as

u_{i 1} = - g_{i} (x_{i}) (\begin{matrix} \int_{0}^{t} (\begin{matrix} (k_{ih} α_{ih} + γ_{ih}) e_{ih} \\ + δ_{i 1} sgn (e_{ih}) \end{matrix}) dt \\ + k_{ih} e_{ih} + {\hat{w}}_{ih}^{T} σ_{ih} \end{matrix})

(42)

where the weight ${\hat{w}}_{ih}$ is updated by equation (24).

Critic NN implementation

For finding the optimal control of the MRR (equation (2)), we need to address HJB equation (13) for $\nabla J_{i}^{*} (s_{i})$ and then obtain $u_{i}^{*} (s_{i})$ . However, equation (13) is a nonlinear partial differential equation and difficult to be solved, because of the heavy burden of computation. Therefore, one employed a critic NN to approximate $J_{i}^{*} (s_{i})$ .

The index function is highly non-analytic and nonlinear, the critic NN can be used to approximate $J_{i} (s_{i})$ , and the proposed critic NN can be formulated as

J_{i} (s_{i}) = {W_{ic}}^{T} δ_{ic} (s_{i}) + ε_{ic}

(43)

where $W_{ic} \in R^{{\bar{N}}_{i}}$ is the desired weight vector, ${\bar{N}}_{i}$ denotes the quantity of neurons in hidden layer, $δ_{ic}$ indicates the active function, and $ε_{ic}$ is the critic NN approximation error. Hence, the partial derivative of $\nabla J_{i} (s_{i})$ can be represented as

\nabla J_{i} (s_{i}) = {(\nabla δ_{ic} (s_{i}))}^{T} W_{ic} + \nabla ε_{ic}

(44)

where $\nabla ε_{ic}$ and $\nabla δ_{ic} (s_{i}) = (\partial (δ_{ic} (s_{i}))) / \partial s_{i} \in R^{{\bar{N}}_{i} \times n}$ are partial derivative of the approximate error and activation function.

For system (2), combining equations (36) and (44), we get

\begin{matrix} 0 = & H_{i}^{T} H_{i} + Λ_{i} {({(\nabla δ_{ic} (s_{i}))}^{T} W_{ic} + \nabla ε_{ic})}^{T} \\ \times ({(\nabla δ_{ic} (s_{i}))}^{T} W_{ic} + \nabla ε_{ic}) \\ + {({(\nabla δ_{ic} (s_{i}))}^{T} W_{ic} + \nabla ε_{ic})}^{T} {\overset{\cdot}{s}}_{i} + U_{i} (s_{i}, u_{i} (s_{i})) . \end{matrix}

(45)

Substituting equation (44) into equation (13), the Hamilton function can be reformulated as

\begin{matrix} H_{i} (s_{i}, u_{i} (s_{i}), W_{ic}) = H_{i}^{T} H_{i} + Λ_{i} \\ \cdot {({(\nabla δ_{ic} (s_{i}))}^{T} W_{ic} + \nabla ε_{ic})}^{T} \times ({(\nabla δ_{ic} (s_{i}))}^{T} W_{ic} + \nabla ε_{ic}) \\ + {({(\nabla δ_{ic} (s_{i}))}^{T} W_{ic} + \nabla ε_{ic})}^{T} {\overset{\cdot}{s}}_{i} + U_{i} (s_{i}, u_{i} (s_{i})) \\ = - \nabla {ε_{ic}}^{T} {\overset{\cdot}{s}}_{i} = e_{iJh}, \end{matrix}

(46)

where $e_{iJh}$ is the NN approximation remnants error NN. Because of the desired weight vector, $W_{ic}$ is unknown, and critic NN can be approximated by

{\hat{J}}_{i} (s_{i}) = {\hat{W}}_{ic}^{T} δ_{ic} (s_{i}),

(47)

where ${\hat{W}}_{ic}$ is the estimation of $W_{ic}$ and ${\hat{J}}_{i} (s_{i})$ is the estimation of $J_{i} (s_{i})$ .

The partial derivative of ${\hat{J}}_{i} (s_{i})$ can be formulated as

\nabla {\hat{J}}_{i} (s_{i}) = {(δ_{ic} (s_{i}))}^{T} {\hat{W}}_{ic}

(48)

Then, one can obtain the approximate Hamilton function as

\begin{matrix} H_{i} (s_{i}, u_{i} (s_{i}), {\hat{W}}_{ic}) = H_{i}^{T} H_{i} + Λ_{i} \\ {({(\nabla δ_{ic} (s_{i}))}^{T} {\hat{W}}_{ic})}^{T} ({(\nabla δ_{ic} (s_{i}))}^{T} {\hat{W}}_{ic}) \\ + {({(\nabla δ_{ic} (s_{i}))}^{T} {\hat{W}}_{ic})}^{T} {\overset{\cdot}{s}}_{i} + U_{i} (s_{i}, u_{i} (s_{i})) = {e_{i}}_{J} \end{matrix}

(49)

We use the objective function ${E_{i}}_{J} = (1 / 2) e_{iJ}^{T} e_{iJ}$ , which is minimized by gradient decent algorithm, to adjust the critic NN weight vector ${\hat{W}}_{ic}$ that is updated by

{\overset{\cdot}{\hat{W}}}_{ic} = - α_{i} e_{iJ} \nabla δ_{ic} (s_{i}) {\overset{\cdot}{s}}_{i}

(50)

where $α_{i}$ is the updated rate of the critic NN. Denote $υ_{i} = \nabla δ_{ic} (s_{i}) {\overset{\cdot}{s}}_{i}$ and consider that there exist a positive constant $υ_{iL}$ where $‖ υ_{i} ‖ \leq υ_{iL}$ .

Define the weight approximation error as

{\tilde{W}}_{ic} = W_{ic} - {\hat{W}}_{ic}

(51)

Then, according to equations (46), (49), and (50), we conclude that

e_{iJ} = e_{iJh} - {\tilde{W}}_{ic}^{T} υ_{i}

(52)

The dynamics of the weight approximation error can be given as

{\overset{\cdot}{\tilde{W}}}_{ic} = - {\overset{\cdot}{\hat{W}}}_{ic} = α_{i} e_{iJ} υ_{i} = α_{i} (e_{iJh} - {\tilde{W}}_{ic}^{T} υ_{i}) υ_{i}

(53)

According to equations (40) and (44), the desired optimal control policy can be formulated as

u_{i 2}^{*} (s_{i}) = - \frac{1}{2} R_{i}^{- 1} g_{i}^{T} (x_{i}) ({(\nabla δ_{ic} (s_{i}))}^{T} W_{ic} + \nabla ε_{ic})

(54)

and it can be approximated as

{\hat{u}}_{i 2}^{*} (s_{i}) = - \frac{1}{2} R_{i}^{- 1} g_{i}^{T} (x_{i}) ({(\nabla δ_{ic} (s_{i}))}^{T} {\hat{W}}_{ic})

(55)

Note that the expression of equation (55) is obtained using the critic NN only and the training of action NN is no longer needed. It infers that the computational burden can be reduced effectively.

Theorem 2

Consider the cost function in equation (43), which is approximated by the single-layer critic NN, and the estimated cost function in equation (47) that is built by ${\hat{W}}_{ic}$ . If the weight of the critic NN is updated by equation (53), then the weight approximation error in equation (53) can be proved to be uniformly ultimately bounded (UUB).

Proof

Choose the Lyapunov function candidate as

V_{ic} (t) = \frac{1}{2 α_{i}} {\tilde{W}}_{ic}^{T} {\tilde{W}}_{ic}

(56)

The time derivative of $V_{ic} (t)$ denotes

\begin{matrix} {\overset{\cdot}{V}}_{ic} (t) & = \frac{1}{α_{i}} {\tilde{W}}_{ic}^{T} {\overset{\cdot}{\tilde{W}}}_{ic} = {\tilde{W}}_{ic}^{T} (e_{iJh} - {\tilde{W}}_{ic}^{T} υ_{i}) υ_{i} \\ = {\tilde{W}}_{ic}^{T} e_{iJh} υ_{i} - ‖ {\tilde{W}}_{ic}^{T} υ_{i} ‖^{2} \leq \frac{1}{2} ‖ e_{iJh} ‖^{2} - \frac{1}{2} ‖ {\tilde{W}}_{ic}^{T} υ_{i} ‖^{2} \end{matrix}

Hence, we can get ${\overset{\cdot}{V}}_{ic} (t) \leq 0$ , while $s_{i}$ lies outside the compact set, $Ω_{i 1} = {{\tilde{W}}_{ic} : ‖ {\tilde{W}}_{ic} ‖ \leq (e_{iJh} / υ_{iL})}$ . This completes the proof.

Combining equations (42) and (55), $u_{i}^{*}$ is given as

\begin{matrix} u_{i}^{*} = - \frac{1}{2} R_{i}^{- 1} g_{i}^{T} (x_{i}) ({(\nabla δ_{ic} (s_{i}))}^{T} {\hat{W}}_{ic}) - \\ g_{i} (x_{i}) ({\hat{w}}_{ih}^{T} σ_{ih} + k_{ih} e_{ih} + \int_{0}^{t} (\begin{matrix} (k_{ih} α_{ih} + γ_{ih}) e_{ih} \\ + δ_{i 1} sgn (e_{ih}) \end{matrix}) dt) \end{matrix}

(57)

Theorem 3

Consider the n-DOF MRR system, with the subsystem dynamic model as the form of equation (2), which is completely unknown while designing the optimal controller. If the decentralized optimal SMC (equation (57)) is adopted for the MRR system, then the closed-loop robotic system is asymptotically stable.

Proof

Choosing the Lyapunov candidate function

V (t) = \sum_{i = 1}^{n} V_{i} (t) = \sum_{i = 1}^{n} \frac{1}{2} s_{i}^{T} s_{i} + J_{i}^{*} (s_{i})

(58)

According to equations (32), (42), and (53), the time derivative of $V_{i} (t)$ is written as

\begin{matrix} V (t) & = \sum_{i = 1}^{n} {\overset{\cdot}{V}}_{i} (t) = \sum_{i = 1}^{n} ({s_{i}}^{T} {\overset{\cdot}{s}}_{i} + {(\nabla J_{i}^{*} (s_{i}))}^{T} {\overset{\cdot}{s}}_{i}) \\ = \sum_{i = 1}^{n} ({s_{i}}^{T} (\begin{matrix} F_{i} (x_{i}, x_{jd}) + g_{i} (x_{i}) u_{i} \\ - {\overset{\cdot\cdot}{x}}_{i 1 d} + k_{i} {\overset{\cdot}{e}}_{i} + Δ ψ_{i} \end{matrix}) + {(\nabla J_{i}^{*} (s_{i}))}^{T} {\overset{\cdot}{s}}_{i}) \\ = \sum_{i = 1}^{n} (\begin{matrix} {s_{i}}^{T} (\begin{matrix} g_{i} (x_{i}) u_{i} + F_{i} (x_{i}, x_{jd}) \\ - {\overset{\cdot\cdot}{x}}_{i 1 d} + k_{i} {\overset{\cdot}{e}}_{i} + Δ ψ_{i} \end{matrix}) - s_{i}^{T} Q_{i} s_{i} \\ - u_{i}^{T} R_{i} u_{i} - ϒ_{i} {(\nabla J_{i}^{*} (s_{i}))}^{2} - H_{i}^{T} H_{i} \end{matrix}) \end{matrix}

(59)

As $F_{i} (x_{i}, x_{jd})$ and $g_{i} (x_{i})$ are Lipchitz, so has constant $L_{if} > 0, L_{ig} > 0$ and have $‖ F_{i} ‖ \leq L_{if} ‖ s_{i} ‖, ‖ g_{i} ‖ \leq w_{ig}$

\begin{matrix} \overset{\cdot}{V} (t) & = \sum_{i = 1}^{n} ({\overset{\cdot}{V}}_{i} (t)) \\ \leq \sum_{i = 1}^{n} (\begin{matrix} L_{if} {‖ s_{i} ‖}^{2} + w_{ig} ‖ s_{i} ‖ ‖ u_{i} ‖ + v_{i} ‖ s_{i} ‖ + H_{i} ‖ s_{i} ‖ \\ - s_{i}^{T} Q_{i} s_{i} - u_{i}^{T} R_{i} u_{i} - ϒ_{i} {(\nabla J_{i}^{*} (s_{i}))}^{2} - H_{i}^{T} H_{i} \end{matrix}) \\ \leq \sum_{i = 1}^{n} (\begin{matrix} L_{if} {‖ s_{i} ‖}^{2} + w_{ig} ‖ s_{i} ‖ ‖ u_{i} ‖ + v_{i} ‖ s_{i} ‖ + H_{i} ‖ s_{i} ‖ \\ - λ_{min} (Q_{i}) {‖ s_{i} ‖}^{2} - H_{i}^{T} H_{i} \\ - λ_{min} (R_{i}) {‖ u_{i} ‖}^{2} - ϒ_{i} {(\nabla J_{i}^{*} (s_{i}))}^{2} \end{matrix}) \\ \leq \sum_{i = 1}^{n} (\begin{matrix} L_{if} {‖ s_{i} ‖}^{2} + \frac{1}{2} {w_{ig}}^{2} {‖ s_{i} ‖}^{2} + \frac{1}{2} {‖ u_{i} ‖}^{2} + v_{i} ‖ s_{i} ‖ \\ - λ_{min} (Q_{i}) {‖ s_{i} ‖}^{2} - λ_{min} (R_{i}) {‖ u_{i} ‖}^{2} \\ + H_{i} ‖ s_{i} ‖ - H_{i}^{T} H_{i} - ϒ_{i} {(\nabla J_{i}^{*} (s_{i}))}^{2} \end{matrix}) \\ = \sum_{i = 1}^{n} (\begin{matrix} [(\frac{1}{2} {w_{ig}}^{2} - λ_{min} (Q_{i}) + L_{if}) ‖ s_{i} ‖ + H_{i} + v_{i}] \\ \cdot ‖ s_{i} ‖ - (- \frac{1}{2} + λ_{min} (R_{i})) {‖ u_{i} ‖}^{2} \\ - H_{i}^{T} H_{i} - ϒ_{i} {(\nabla J_{i}^{*} (s_{i}))}^{2} \end{matrix}) \end{matrix}

(60)

where $v_{i} = - {\overset{\cdot\cdot}{x}}_{i 1 d} + k_{i} {\overset{\cdot}{e}}_{i} (t), k_{i}$ is a positive constant. There has been positive constants $κ_{i}, ε_{i}, ς_{i}, ‖ {\overset{\cdot\cdot}{x}}_{i 1 d} ‖ \leq ε_{i}, ‖ H_{i} ‖ \leq κ_{i}, ‖ {\overset{\cdot}{e}}_{i} ‖ \leq ς_{i}$ . Then, one contains

\begin{matrix} \overset{\cdot}{V} (t) & = \sum_{i = 1}^{n} ({\overset{\cdot}{V}}_{i} (t)) \\ \leq \sum_{i = 1}^{n} (\begin{matrix} - (- δ_{i} + ϕ_{i 1} ‖ s_{i} ‖) ‖ s_{i} ‖ - ϕ_{i 2} {‖ u_{i} ‖}^{2} \\ - H_{i}^{T} H_{i} - ϒ_{i} {(\nabla J_{i}^{*} (s_{i}))}^{2} \end{matrix}) \end{matrix}

(61)

where $ϕ_{i 1} = λ_{min} (Q_{i}) - L_{if} - {w_{ig}}^{2} / 2$ and $ϕ_{i 2} = λ_{min} (R_{i}) - 1 / 2, δ_{i} = ε_{i} + k_{i} ς_{i} + κ_{i}$ . Hence, one can get ${\overset{\cdot}{V}}_{i} (t) \leq 0$ while $s_{i}$ lies outside the compact set $Ω_{i 2} = {s_{i} : ‖ s_{i} ‖ \leq (ε_{i} + k_{i} ς_{i} + κ_{i}) / (λ_{min} (Q_{i}) - L_{if} - (1 / 2) w_{ig}^{2})}$ , and if the conditions hold

λ_{min} (Q_{i}) \geq L_{if} + w_{ig}^{2} / 2, λ_{min} (R_{i}) \geq 1 / 2

(62)

Moreover, equation (61) means that $\overset{\cdot}{V} (t) \leq 0$ for $s_{i} \neq 0$ , while equation (62) is satisfied. According to the Lyapunov theory, optimal controller under the SMC is asymptotically stable. This theorem has been completed.

Remark 2

To solve the difficulty of addressing HJB equation, a local PI method is introduced by referring previous literature.^33–35 The iterative procedure of PI method with equation (9) is written in Appendix 1.

Simulations

Simulation setup

In this section, two MRRs are given for simulation. For configuration A, the dynamic formulation is given as

\begin{matrix} M (q) = [\begin{matrix} 0.6066 + 0.36 \cos (q_{2}) & 0.1233 + 0.18 \cos (q_{2}) \\ 0.1233 + 0.18 \cos (q_{2}) & 0.1233 \end{matrix}] \\ C (q, \overset{\cdot}{q}) = [\begin{matrix} - 0.36 \sin (q_{2}) {\overset{\cdot}{q}}_{2} & - 0.18 {\overset{\cdot}{q}}_{2} \sin (q_{2}) \\ 0.18 \sin (q_{2}) ({\overset{\cdot}{q}}_{1} - {\overset{\cdot}{q}}_{2}) & 0.18 {\overset{\cdot}{q}}_{1} \sin (q_{2}) \end{matrix}] \\ G (q) = [\begin{matrix} - 5.88 \sin (q_{1} + q_{2}) - 17.64 \sin (q_{1}) \\ - 5.88 \sin (q_{1} + q_{2}) \end{matrix}] \end{matrix}

and the ones for configuration B is given as

\begin{matrix} M (q) = [\begin{matrix} 0.17 - 0.1166 \cos^{2} (q_{2}) & - 0.06 \cos (q_{2}) \\ - 0.06 \cos (q_{2}) & 0.1233 \end{matrix}] \\ C (q, \overset{\cdot}{q}) = [\begin{matrix} - 0.1166 {\overset{\cdot}{q}}_{2} \sin (2 q_{2}) & - 0.06 \sin (q_{2}) {\overset{\cdot}{q}}_{2} \\ (\begin{matrix} 0.06 \sin (q_{2}) {\overset{\cdot}{q}}_{2} \\ - 0.0583 \sin (2 q_{2}) {\overset{\cdot}{q}}_{1} \end{matrix}) & - 0.06 {\overset{\cdot}{q}}_{1} \sin (q_{2}) \end{matrix}] \\ G (q) = [\begin{matrix} 0 \\ - 5.88 \sin (q_{1} + q_{2}) \end{matrix}] \end{matrix}

The desired trajectories of both configurations A and B are written as follows

{\begin{matrix} x_{1 d} = 0.4 \sin t + 0.1 \cos (2 t) \\ x_{2 d} = 0.2 \cos (3 t) - 0.3 \sin (2 t) \end{matrix}

The NN weight vector estimation can be written as ${\hat{W}}_{ic} = [{\hat{W}}_{ic 1}, {\hat{W}}_{ic 2}, {\hat{W}}_{ic 3}]^{T}$ . For configuration A, its initial value is given as ${\hat{W}}_{1 c 0} = [\begin{matrix} 20 & 15 & 10 \end{matrix}]^{T}$ and ${\hat{W}}_{2 c 0} = [\begin{matrix} 20 & 25 & 30 \end{matrix}]^{T}$ . For configuration B, its initial value is represented as ${\hat{W}}_{1 c 0} = [\begin{matrix} 20 & 25 & 30 \end{matrix}]^{T}$ and ${\hat{W}}_{2 c 0} = [\begin{matrix} 20 & 25 & 30 \end{matrix}]^{T}$ . The activation function is chosen by $δ_{ic} = [s_{i}^{2}, s_{i}^{2}, s_{i}^{2}]$ . Moreover, we select the control parameters $k_{ih} = 800, α_{ih} = 350, R_{i} = Q_{i} = 0.1 I, γ_{ih} = 5, Γ_{ih} = 0.1 I, δ_{i 1} = 0.5$ .

Simulation results

The simulation results are represented to improve the effectiveness of control torques, joint position tracking, the convergence of NN weights, and position tracking errors. The host computer CPU used in this article is Intel Core i7-7700 with 8.00 GB RAM, 3.60 GHz, and the suite of required software is MATLAB 2016a running in Windows 10. Two different control methods are used in the simulations that contain the existing NN-based optimal control method, such as the study by Dong and colleagues,^23,24 and the proposed ADP-based decentralized optimal SMC method.

Figure 1 shows the joint position tracking curves under the existing method. One can observe that the chattering effect is obvious at the first 2 s that is caused without dynamic decomposition and optimal compensation of the IDCs. Figure 2 illustrates the joint position curves of configuration A joint 1 under the proposed method. Comparing with Figure 1, we can obtain that the desired trajectories can be tracked within a very short time period, since the proposed method can verify the effectiveness. Figures 3 and 4 are joint position tracking error curves of configuration A under the existing method and proposed method. In Figure 3, we can obtain that the position errors are obvious at few seconds and the amplitude of steady-state error can be ±0.02 rad. From Figure 4, the position errors are tracking very fast and nearly at 0.

Figure 1.

Position tracking for configuration A with existing methods: (a) Joint 1 and (b) Joint 2.

Figure 2.

Position tracking for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.

Figure 3.

Position error for configuration A with existing methods: (a) Joint 1 and (b) Joint 2.

Figure 4.

Position error for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.

Figures 5 and 6 are the velocity tracking curves under the existing method and the proposed method of configuration A, respectively. Figures 7 and 8 illustrate the velocity error curves under the existing method and the proposed method of configuration A. Because the existing methods have not considered the compensation problem of the IDC effects, the tracking error is larger than the method that we proposed.

Figure 5.

Velocity tracking for configuration A with existing methods: (a) Joint 1 and (b) Joint 2.

Figure 6.

Velocity tracking for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.

Figure 7.

Velocity error for configuration A with existing methods: (a) Joint 1 and (b) Joint 2.

Figure 8.

Velocity error for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.

Figure 9 illustrates the control torque curve with the existing method of configuration A. For joint 1, it can be observed that the initial control torque is large and may bring the burden to the motors. That is because the NNs need time to learning for supply the big torque. For joint 2, it can be seen that at some time, the control torque has a sudden vibration. That is because the method without IDCs compensation. Figure 10 illustrates the control torque curve of configuration A with the proposed scheme. The output torques have been optimized with an appropriate behavior that may match up the output power of motors.

Figure 9.

Control torque for configuration A with existing methods: (a) Joint 1 and (b) Joint 2.

Figure 10.

Control torque for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.

Figure 11 shows the critic NN curves of configuration A joints 1 and 2 under the proposed method. Because of the implementation process of PI and critic NN training, the convergence results of weights can be obtained before 1 s. The weights of the critic NN converge to $W_{c 1} = [\begin{matrix} 7.2897 & 2.2897 & - 2.7103 \end{matrix}]^{T}$ and $W_{c 2} = [\begin{matrix} - 4.9820 & 0.0180 & 5.0180 \end{matrix}]^{T}$ . Unlike the conventional ADP-based optimal control methods that rely on both action and critic NNs, the optimal control is obtained using the critic NN only and the training of action NN is no longer needed. It infers that the computational burden can be reduced effectively.

Figure 11.

Critic NN for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.

Figures 12 –22 represent the trajectory tracking curves, position error curves, velocity curves, velocity error curves and control torque, and convergence results of weights for configuration B. We can obtain the similar results. The conclusion can be received comparing with configuration A. It improves the proposed method without the requirements of adjusting parameters. The weights of critic NN converge to $W_{c 1} = [\begin{matrix} 19.0669 & 23.9518 & 28.8368 \end{matrix}]^{T}$ and $W_{c 2} = [\begin{matrix} - 4.9960 & 0.0040 & 5.0040 \end{matrix}]^{T}$ .

Figure 12.

Position tracking for configuration B with existing methods: (a) Joint 1 and (b) Joint 2.

Figure 13.

Position tracking for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.

Figure 14.

Position error for configuration B with existing methods: (a) Joint 1 and (b) Joint 2.

Figure 15.

Position error for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.

Figure 16.

Velocity tracking for configuration B with existing methods: (a) Joint 1 and (b) Joint 2.

Figure 17.

Velocity tracking for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.

Figure 18.

Velocity error for configuration B with existing methods: (a) Joint 1 and (b) Joint 2.

Figure 19.

Velocity error for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.

Figure 20.

Control torque for configuration B with existing methods: (a) Joint 1 and (b) Joint 2.

Figure 21.

Control torque for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.

Figure 22.

Critic NN for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.

For the simulations, one can obtain the decentralized optimal SMC method and can guarantee stability and accuracy.

Conclusion

This article proposes a model-free decentralized SMC method for MRR system via PI scheme–based ADP. First, the dynamic formulation is expressed by a synthesis of joint subsystems with IDC effects. Then, based on the SMC technique, the decentralized optimal control problem is transformed into the optimal compensation issue of unknown dynamics of each subsystem. Based on ADP and the PI theory, the HJB equation can address by using a critic NN and one can obtain the optimal control policy. According to the Lyapunov theory, the closed-loop MRR systems are guaranteed UUB. Finally, simulations are verified by the effectiveness of the method.

As is known, ADP-based control methods have been successfully used to address the optimal control problems in battery management, residential energy management, water–gas shift reaction, and coal gasification process in theory. However, the effectiveness analysis of the mentioned works all relies on numerical analyses and simulation results. Indeed, a drawback of the researches of ADP-based optimal control methods, in a common view, is the development of experimental researches and practical applications for physical systems, especially for robotic systems. How to address the problems in establishing an experimental platform of robotic systems that satisfy the real-time and accurate requirements is a key problem for implementing the proposed decentralized SMC method to actual MRR systems, and it is also our future research topic.

Footnotes

Appendix 1

Handling Editor: James Baldwin

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China (grant nos 61374051, 61773075, and 617030555), the Scientific Technological Development Plan Project in Jilin Province of China (grant nos 20160520013JH, 20190103004JH, and 20160414033GH), and Project of the Engineering Laboratory of Intelligent Robot and Vision Measurement and Control Technology in Jilin Province (2019C010).

ORCID iD

Bo Dong

References

Paredis

CJJ

Brown

Khosla

. A rapidly deployable manipulator system. In: Proceedings of the IEEE international conference on robotics & automation, Minneapolis, MN, 22–28 April 1996. New York: IEEE.

Saleh

Fairouz

A novel robust adaptive second-order sliding mode tracking control technique for uncertain dynamical systems with matched and unmatched disturbances. Int J Control Automat Syst 2017; 15: 1097–1106.

Donya

Saleh

Design of an adaptive super-twisting decoupled terminal sliding mode control scheme for a class of fourth-order systems. ISA Trans 2018; 75: 216–225.

Saleh

Fairouz

Robust global second-order sliding mode control with adaptive parameter-tuning law for perturbed dynamical systems. Trans Inst Meas Control 2018; 40: 2855–2867.

Liu

Chen

Zhang

. Second order sliding mode control of pan-tilt joint in modular manipulator. In: Proceedings of the 10th world congress on intelligent control and automation, Beijing, China, 6–8 July 2012, pp. 2188–2193. New York: IEEE.

Liu

Sliding mode adaptive neural-network control for nonholonomic mobile modular manipulators. J Intell Robot Syst 2005; 44: 203–224.

Deng

, et al. Decentralised adaptive control of cooperating Robotic manipulators with disturbance observers. IET Control Theory Appl 2014; 8: 515–521.

Yang

, et al. Decentralized fuzzy control of multiple cooperating robotic manipulators with impedance interaction. IEEE Trans Fuzzy Syst 2015; 4: 1044–1056.

Llama

Flores

Santibannez

, et al. Global convergence of a decentralized adaptive fuzzy control for the motion of robot manipulators: application to the Mitsubishi PA10-7CE as a case of study. J Intell Robot Syst 2016; 82: 363–377.

10.

Zhu

Decentralized adaptive fuzzy sliding mode control for reconfigurable modular manipulators. Int J Robust Nonlin 2010; 20: 472–488.

11.

Slotine

Sastry

SS.

Tracking control of nonlinear systems using sliding surfaces with application to robot manipulator. Int J Control 1983; 38: 465–492.

12.

Ficola

Cava

ML.

A sliding mode controller for a two-joint robot with an elastic link. Math Comput Simul 1996; 41: 559–569.

13.

Bellman

RE.

Dynamic programming. Princeton, NJ: Princeton University Press, 1957.

14.

Pontryagin

LS.

Optimal control processes. Uspehi Mat Nauk 1959; 14: 3–20.

15.

Lewis

Syrmos

VL.

Optimal control. New York: Wiley, 1995.

16.

Abu-Khalaf

Lewis

FL.

Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 2005; 41: 779–791.

17.

Pan

Xin

Nonlinear robust and optimal control of robot manipulators. Nonlinear Dyn 2014; 76: 237–254.

18.

Lewis

Selmic

Campos

Neuro-fuzzy control of industrial systems with actuator nonlinears. New York: ACM, 2002.

19.

Bhsin

Kamalapurkar

Vamvoudakis

, et al. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 2013; 49: 82–92.

20.

Nageshrao

Lopes

Jeltsema

, et al. Passivity based reinforcement learning control of a 2-DOF manipulator arm. Mechatronics 2014: 1001–1007.

21.

Tang

Liu

Tong

Adaptive neural control using reinforcement learning for a class of robot manipulator. Neural Comput Appl 2014; 25: 135–141.

22.

Melek

Clark

Decentralized robust control of robot manipulators with harmonic drive transmission and application to modular and reconfigurable serial arms. Robotica 2009; 27: 291–302.

23.

Dong

Decentralized reinforcement learning robust optimal tracking control for time varying constrained reconfigurable modular robot based on ACI and Q-function. Math Probl Eng 2013; 2013: 387817.

24.

Dong

Liu

Decentralized control of harmonic drive based modular robot manipulator using only position measurements: theory and experimental verification. J Intell Robot Syst 2017; 88: 3–18.

25.

Zhao

Local joint information based active fault tolerant control for reconfigurable manipulator. Nonlinear Dyn 2014; 77: 859–876.

26.

Zhou

Dong

Torque sensorless force/position decentralized control for constrained reconfigurable manipulator with harmonic drive transmission. Int J Control Autom 2017; 15: 2364–2375.

27.

Bian

Jiang

Decentralized adaptive optimal control of large-scale systems with application to power systems. IEEE Trans Ind Electron 2015; 62: 2439–2447.

28.

Zhao

Wang

Shi

, et al. Decentralized control for large-scale nonlinear systems with unknown mismatched interconnections via policy iteration. IEEE Trans Syst Man Cybern 2018; 48: 1725–1735.

29.

Dong

Zhou

Liu

, et al. Decentralized robust optimal control for modular robot manipulators via critic-identifier structure-based adaptive dynamic programming. Neural Comput 2018. DOI: 10.1007/s00521-018-3714-8.

30.

Dong

Zhou

Liu

, et al. Torque sensorless decentralized neuro-optimal control for modular and reconfigurable robots with uncertain environments. Neurocomputing 2018; 282: 60–73.

31.

Chen

Tee

, et al. Reinforcement learning control for coordinated manipulation of multi-robots. Neurocomputing 2015; 170: 168–175.

32.

Lian

Chen

, et al. Near-optimal tracking control of mobile robots via receding-horizon dual heuristic programming. IEEE Trans Cybern 2016; 46: 2484–2496.

33.

Modares

Lewis

Naghibi-Sistani

MB.

Adaptive optimal control of un-known constrained-input systems using policy iteration and NNs. IEEE Trans. Neural Netw Learn Syst 2013; 24: 1513–1525.

34.

Zhao

Shi

Wang

Asymptotically stable critic designs for approximate optimal stabilization of nonlinear systems subject to mismatched external disturbances. Neurocomputing 2019. DOI: 10.1016/j.neucom.2018.08.092.

35.

Zhao

Wang

Shi

, et al. Decentralized control for large-scale nonlinear systems with unknown mismatched interconnections via policy iteration. IEEE Trans Syst Man Cybern Syst 2018; 48: 1725–1735.

Model-free optimal decentralized sliding mode control for modular and reconfigurable robots based on adaptive dynamic programming

Abstract

Keywords

Introduction

Problem statement

Assumption 1

Remark 1

Model-free decentralized optimal SMC based on ADP

Derivation of the optimal SMC scheme

Definition 1

Identification of the IDC

Assumption 2 31

Assumption 3 32

Theorem 1

Proof

Critic NN implementation

Theorem 2

Proof

Theorem 3

Proof

Remark 2

Simulations

Simulation setup

Simulation results

Conclusion

Footnotes

Appendix 1

Declaration of conflicting interests

Funding

ORCID iD

References

Assumption 2³¹

Assumption 3³²