Sage Journals: Discover world-class research

Abstract

The problem of optimal tracking control for robot–environment interaction is studied in this article. The environment is regarded as a linear system and an admittance control with iterative linear quadratic regulator method is obtained to guarantee the compliant behaviour. Meanwhile, an adaptive dynamic programming-based controller is proposed. Under adaptive dynamic programming frame, the critic network is performed with radial basis function neural network to approximate the optimal cost, and the neural network weight updating law is incorporated with an additional stabilizing term to eliminate the requirement for the initial admissible control. The stability of the system is proved by Lyapunov theorem. The simulation results demonstrate the effectiveness of the proposed control scheme.

Keywords

Adaptive dynamic programming admittance adaptation neural network optimal control robot–environment interaction

Introduction

Robot applications are becoming more and more widespread, such as rehabilitation therapy, assembly automation and surgery.^1

–4 They can either work independently to accomplish tasks or cooperate with their human partners for certain tasks. In the actual application process, the robot will inevitably interact with the external environments.^5
–7 Consequently, in recent years, interaction control between the robot and environment has attracted great concern and is considered to be greatly important.

In existing research, two main approaches are applied to achieve compliant behaviour of the robot, that is, hybrid position/force control and impedance control.^8,9 The first approach requires the position subspace and force subspace decomposition, task planning and control law switching in the execution process. Without considering the dynamic coupling of the environment and the robot, the accuracy of the hybrid position/force control cannot be guaranteed.¹⁰ In contrast, the second approach aims to adjust the mechanical impedance to a target one, which will guarantee the robot to be complaint with the interaction force imposed by the external environment. Impedance control ensures the safety of the robot and the environment and it has been proved to be more feasible and has better robustness. According to the causality of the controller, impedance control has two implementation methods, one is named impedance control and the other is admittance control. In impedance control system, the interaction force can be estimated from the desired motion trajectory and impedance model, while in admittance control system, the reference trajectory is obtained from the measured environmental external force and the desired admittance model. Therefore, in this article, admittance control is adopted to solve the problem of robot–environment interaction control.

In admittance control system, force and the admittance model are two important parts. When robot–environment interaction exists, the force can be detected and measured by the sensors installed on the end-effector of the robot arm. But, how to derive optimal parameters of the admittance model is non-trivial. On the one hand, it is usually difficult to derive the desired admittance model because of the complexity of environmental dynamics; on the other hand, a fixed admittance model cannot satisfy all cases. Taking human–robot cooperation as an example, variable admittance control is necessary to ensure more efficient performance.¹¹ To solve these problems, iterative learning has been studied in robot intelligent control area. It has been investigated to obtain admittance parameters to adapt to unknown environment. The aim of this approach is to introduce human learning skills into the robot and improve control performance by repeating a task. Cohen and Flash¹² proposed an impedance learning control scheme using an associative search network to complete a wall-following work. Neural network (NN) is introduced into the impedance control to regulate the parameters.¹³ However, the iterative learning method requires the robot to operate repeatedly, which brings inconvenience in practical process and is not feasible in many situations. Love and Book,¹⁴ Uemura and Kawamura,¹⁵ Gribovskaya et al.,¹⁶ Stanisic and Fernández,¹⁷ Landi et al.¹⁸ and Yao et al.¹⁹ have proposed to utilize adaptation approaches to address the problems stated above.

Robotic motion control is a challenging task as it is difficult to obtain accurate model concerning that the robot is a non-linear and highly coupled system. Proportional–integral–derivative (PID) control, NN control, adaptive control and other control methods have been applied to the robot system.^{20

–27} As a classical control method, PID control is employed to the robot system and can track the given reference trajectory well.²⁸ It is acknowledged that PID control has some advantages, such as simple structure and good robustness, but it is not easy to select suitable PID parameters if the controlled plant is complex. In addition, when dynamic uncertainties exist in the system, PID control cannot satisfy the performance requirements for the magnitude of overshoot, the rising and settling time and so on. NN has the fundamental characteristics of human brain and can simulate human behaviour for information processing, therefore it is widely used in the control field for unknown system identification. NN control can model the uncertain dynamics online to improve the system performance.²⁹ An admittance adaptation method and the NN-based controller are applied into the robot system.³⁰

Tracking control is a significant research issue in the domain of robot intelligent control. For a controlled system, stability is just the minimum requirement. Optimal control needs to be considered, that is, it is required to design an optimal tracking controller, which could ensure system stability of the robot while minimizing the cost function. Werbos³¹ proposed adaptive dynamic programming (ADP) strategy and it is considered to be an effective approach to resolve the optimal control problem.³² The key of ADP method is to find a solution of Hamilton–Jacobi–Bellman (HJB) equation. However, because it is a partial differential equation, when the controlled system is non-linear but not linear, its analytical solution will be very difficult to obtain, or even impossible. To solve the above problem, policy iterative is considered as an effective method to find the approximate solution, which requires initial stability control.³³ However, in practical process, the initial admissible control is usually very difficult to satisfy. Then, NN is introduced to derive an approximate solution of the HJB equation. The approximate solution is obtained by NN-based method, meanwhile the requirement of initial stability is eliminated with the incorporation of an additional term.^34,35

Yang et al.³⁰ paid attention to the robot–environment interaction control, but did not consider the optimization problem. However, for the robot, how to perform path tracking optimization and minimize the cost function is very important. Based on the above discussion, the optimal tracking control problem for robot–environment interaction is studied in this article. Moreover, the admittance control and ADP approach are adopted to improve the system performance. The contributions of this article are listed below:

The environment with unknown dynamics is modelled as a linear system. An admittance adaptation method with iterative linear–quadratic regulator (LQR) is obtained to achieve a compliant behaviour.

ADP approach is introduced into the robot system to solve the optimal tracking problem. The critic network with radial basis function (RBF) is developed to approximate the minimum cost function. In addition, to eliminate the requirement for initial admissible control, a stabilizing term is incorporated into NN weight updating law.

The rest of this article is arranged as follows. Firstly, the robot and environment systems and control objectives are described. Next, the control scheme including admittance adaptation and optimal control using ADP is developed. Then, simulation studies are given. Finally, the conclusion is drawn.

Preliminaries and problem formulation

Robot dynamics

The n-link robot manipulator dynamics is showed as the following Lagrangian form

M (q) \ddot{q} + C (q, \dot{q}) \dot{q} + G (q) = μ

where $q = [q_{1}, q_{2}, \dots, q_{n}]^{T} \in ℝ^{n}$ , $\dot{q} = [{\dot{q}}_{1}, {\dot{q}}_{2}, \dots, {\dot{q}}_{n}]^{T} \in ℝ^{n}$ and $\ddot{q} = [{\ddot{q}}_{1}, {\ddot{q}}_{2}, \dots, {\ddot{q}}_{n}]^{T} \in ℝ^{n}$ represent the robot position vector, velocity vector and acceleration vector in joint space, respectively. $μ \in ℝ^{n}$ is the joint torque, while $M (q) \in ℝ^{n \times n}$ , $C (q, \dot{q}) \in ℝ^{n \times n}$ and $G (q) \in ℝ^{n}$ are known matrices and denote the inertial matrix, Coriolis/centrifugal matrix and gravity vector, respectively. For convenience, M, C and G denote the known matrices $M (q)$ , $C (q, \dot{q})$ and $G (q)$ in the following section, respectively.

Define the reference trajectory as $q_{r} \in ℝ^{n}$ , and the tracking error $q_{e} \in ℝ^{n}$ is shown as follows

q_{e} = q - q_{r}

Then, the first and second time derivative of q_e are given below

\begin{array}{l} {\dot{q}}_{e} & = \dot{q} - {\dot{q}}_{r} \\ {\ddot{q}}_{e} & = \ddot{q} - {\ddot{q}}_{r} \end{array}

We define the sliding motion surface ξ as follows

ξ = Λ q_{e} + {\dot{q}}_{e}

where $Λ \in R^{n \times n}$ is a constant positive matrix. According to equations (2) to (4), we can get

\begin{array}{l} \dot{q} = & ξ - Λ q_{e} + {\dot{q}}_{r} \\ \ddot{q} = & \dot{ξ} - Λ q_{e} + {\ddot{q}}_{r} \end{array}

Substituting equation (5) into equation (1), the error dynamics is obtained as follows

\begin{array}{l} \dot{ξ} = & - M^{- 1} C (ξ - Λ q_{e} + {\dot{q}}_{r}) - M^{- 1} G \\ ​ & - {\ddot{q}}_{r} + Λ {\dot{q}}_{e} + M^{- 1} μ \end{array}

Then, the following system is obtained

\dot{ξ} = f (ξ) + g (ξ) μ

The non-linear functions $f : ℝ^{n} \to ℝ^{n}$ and $g : ℝ^{n} \to ℝ^{n \times n}$ in equation (7) are specified by

\begin{array}{l} f (ξ) = - M^{- 1} C (ξ - Λ q_{e} + {\dot{q}}_{r}) - M^{- 1} G - {\ddot{q}}_{r} + Λ {\dot{q}}_{e} \\ g (ξ) = M^{- 1} \end{array}

Environment dynamics

It is assumed that the dynamics of environmental interaction force subject to the equation given below

C_{E} \dot{x} + G_{E} x = - F

where C_E and G_E represent the unknown damping and stiffness of the environment, respectively. F denotes the interaction force and can be detected and measured by a force sensor. x is the end-effector position in Cartesian space and the corresponding desired trajectory x_d is defined as

{\dot{x}}_{d} = U_{d} x_{d}

where $U \in ℝ^{m \times m}$ is a known matrix. Subsequently, we define $η = [x, x_{d}]^{T}$ . Thus, combining equation (9) with equation (10), dynamics of the unknown environment and the desired trajectory are generated by

\begin{matrix} \dot{η} = [\begin{matrix} - C_{E}^{- 1} G_{E} & 0 \\ 0 & U_{d} \end{matrix}] η + [\begin{matrix} - C_{E}^{- 1} \\ 0 \end{matrix}] F \\ = A_{e} η + B_{e} F \end{matrix}

If we take equation (11) as a linear system with F as its control input and η as its states to be controlled, this equation relates x with x_d via the optimal feedback control law $F = - K_{e} η$ whose aim is to minimize the cost function

Γ_{1} = \int_{0}^{\infty} (x_{e}^{T} Q_{E 1} x_{e} + F^{T} R_{E} F) d t

This cost function also indicates that our motivation of modifying a desired trajectory x_d is to balance the contact force F with the tracking error $x_{e} = : x - x_{d}$ . And this balance can be tuned via the user-defined $Q_{E 1}$ and R_E .

In this section, the robot and environment dynamics are modelled. Then, we will design a control strategy to achieve the compliant behaviour and optimal tracking control in case the robot interacts with the environment.

Control scheme

A control scheme consisting of three parts as shown in Figure 1 including an optimal trajectory modifier using admittance control, a closed-loop inverse kinematics (CLIK) solver and a trajectory tracking controller based on ADP technique is designed in this section.

Figure 1.

An illustration of the proposed control scheme.

Trajectory modification using admittance control

The solution to equation (12) is an analogy with the LQR problem. It can be rewritten as

\begin{array}{l} Γ = \int_{0}^{\infty} (η^{T} Q_{E} η + F^{T} R_{E} F) d t \\ Q_{E} = [\begin{matrix} Q_{E 1} & - Q_{E 1} U_{d} \\ - U_{d}^{T} Q_{E 1} & U_{d}^{T} Q_{E 1} U_{d} \end{matrix}] \end{array}

whose system counterpart is consistent with equation (11).

In this subsection, an algorithm proposed by Jiang and Jiang³⁶ is adopted to solve the algebraic Riccati equation (ARE) in equation (14) with unknown environment parameters C_E , G_E to derive the feedback gain K_e

\begin{array}{l} P A_{e} + A_{e}^{T} P + Q_{E} - P B_{e} R_{E}^{- 1} B_{e}^{T} P = 0 \\ K_{e} = - R_{E}^{- 1} B_{e}^{T} P \end{array}

Some notations are outlined here. n, m and d are the length of η, F and the sample times integer, respectively. The sampled signal together with the historical ones comprising the matrix as follows

\begin{array}{l} \hat{p} = {[p_{11},2 p_{12}, \dots,2 p_{1 n}, p_{22},2 p_{23}, \dots, p_{n n}]}^{T} \\ \bar{η} = {[η_{1}^{2}, η_{1} η_{2}, \dots, η_{1} η_{n}, η_{2}^{2}, η_{2} η_{3}, \dots, η_{n}^{2}]}^{T} \\ d_{\bar{η}} = {[\bar{η} (t_{1}) - \bar{η} (t_{0}), \bar{η} (t_{2}) - \bar{η} (t_{1}), \dots, \bar{η} (t_{d}) - \bar{η} (t_{d - 1})]}^{T} \\ I_{η}^{η} = {[\int_{t_{0}}^{t_{1}} η \otimes η d t, \int_{t_{1}}^{t_{2}} η \otimes η d t, \dots, \int_{t_{d - 1}}^{t_{d}} η \otimes η d t]}^{T} \\ I_{F}^{η} = {[\int_{t_{0}}^{t_{1}} η \otimes f d t, \int_{t_{1}}^{t_{2}} η \otimes F d t, \dots, \int_{t_{d - 1}}^{t_{d}} η \otimes F d t]}^{T} \end{array}

where $p \in ℝ^{\frac{1}{2} n (n + 1)}$ , $\bar{η} \in ℝ^{\frac{1}{2} n (n - 1)}$ , $d_{\bar{η}} \in ℝ^{d \times \frac{1}{2} n (n - 1)}$ , $I_{η}^{η} \in ℝ^{d \times n^{2}}$ , $I_{F}^{η} \in ℝ^{d \times n m}$ and ⊗ stand for the Kronecker product, and $p_{i j}$ and $η_{i}$ denote entries of P and η, respectively

rank ([I_{η}^{η}, I_{F}^{η}]) = \frac{n (n + 1)}{2} + n m

When the number of sampled data is large enough and the rank condition in equation (16) is satisfied, the algorithm can solve K_e by iteratively calculating equation (17) until ${\hat{p}}^{(k)}$ converge to an acceptable range ε, that is, $| | {\hat{p}}^{(k)} - {\hat{p}}^{(k - 1)} | | < ε$ with ∥ * ∥ denoting the 2-norm of *

\begin{array}{l} Q_{E}^{(k)} = Q_{E} + K_{e}^{(k) T} R_{E} K_{e}^{(k)} \\ Θ^{(k)} = [d_{\bar{η}}, - 2 I_{η}^{η} (I_{n} \otimes K_{e}^{(k) T} R_{E}) - 2 I_{F}^{η} (I_{n} \otimes R_{E})] \\ Ξ^{(k)} = - I_{η}^{η} vec (Q_{E}^{(k)}) \\ [\begin{array}{l} {\hat{p}}^{(k)} \\ vec (K_{e}^{(k + 1)}) \end{array}] = {(Θ^{(k) T} Θ^{(k)})}^{- 1} Θ^{(k) T} Ξ^{(k)} \end{array}

where the superscript $(k)$ denotes the index of the iteration, $vec (*)$ denotes the column vectorization of * and $I_{n} \in ℝ^{n \times n}$ is an identity matrix.

Once the optimal feedback gain K_e is obtained, we can use it to modify x_d . Formulations are given as below

F = - K_{e} η = - [\begin{matrix} K_{e 1} & K_{e 2} \end{matrix}] [\begin{matrix} x \\ x_{d} \end{matrix}]

where $K_{e 1}$ and $K_{e 2}$ are compatible matrix from K_e . Finally, the modified trajectory x_r to be tracked is calculated, which is equivalent to the x in equation (18)

x_{r} = - K_{e 1}^{- 1} F - K_{e 1}^{- 1} K_{e 2} x_{d}

Inverse kinematics using CLIK

The CLIK algorithm is employed to resolve the Cartesian reference trajectory x_r into the one q_r in joint space.³⁷ Let the solution error $e : = κ (q_{r}) - x_{r}$ where $κ (*)$ denotes the forward kinematics and e is given by

\dot{e} = - K_{f} e

where K_f is a positive user-defined matrix that decides the convergent rate of e. Expanding the above equations and combining with $\dot{x} = J_{co} \dot{q}$ and $J_{co} = \partial κ (q) / \partial q$ , the following equation holds

{\dot{q}}_{r} = J_{co}^{†} ({\dot{x}}_{r} - K_{f} (κ (q_{r}) - x_{r}))

integrating of which yields the CLIK method

q_{r} = \int_{0}^{t} (J^{†} {\dot{x}}_{r} - J_{co}^{†} K_{f} (κ (q_{r}) - x_{r})) d t

where $q (0) = κ^{- 1} (x_{r} (0))$ , $J_{co}^{†} = J_{co}^{T} {(J_{co} J_{co}^{T} + σ I_{n})}^{- 1}$ , and $σ \in ℝ$ is introduced to avoid the singularity problem which is recommended to be assigned small enough for improving the solution accuracy.

Optimal control using ADP

As mentioned in the Introduction section, it is very important to optimize the trajectory tracking while minimizing the design cost for robots. On the basis of optimal theory, the optimal control of the system (7) can be derived by solving the HJB equation in the frame of ADP. Consequently, in this subsection, our target is to find such an optimal control μ.

Assume that the functions $f (ξ)$ and $g (ξ)$ are Lipschitz continuous in $ℝ^{2 n}$ and system (7) is controllable, then the optimal control $μ^{*}$ should minimize the cost function which is expressed as

J (ξ (t)) = \int_{t}^{\infty} [Φ (ξ (τ)) + U (ξ (τ), μ (ξ (τ)))] d τ

where $Φ (ξ (τ)) = ξ {(τ)}^{T} Q ξ (τ)$ , $U (ξ (τ), μ (ξ (τ))) = μ {(ξ (τ))}^{T} R μ (ξ (τ))$ , $Q \in ℝ^{n \times n}$ and $R \in ℝ^{n \times n}$ are symmetric positive definite matrices. For robot system (7), the optimal control $μ^{*}$ should not only guarantee system stability but also can make the cost function finite, that is, the control law should be in the admissible control set which defined as χ. Additionally, for any admissible control law $μ \in χ$ , if $J (ξ)$ given in equation (23) is continuously differentiable, we will have the non-linear Lyapunov equation which is an infinitesimal version of equation (23) is shown as follows with $J (0) = 0$

\begin{array}{l} 0 = & Φ (ξ (τ)) + U (ξ (τ), μ (ξ (τ))) \\ ​ & + {(\nabla J (ξ))}^{T} (f (ξ) + g (ξ) μ (ξ)) \end{array}

where $J (ξ (t))$ is short for $J (ξ)$ for convenience and the notation $\nabla * ≜ \frac{\partial *}{\partial ξ}$ denotes the partial derivative of *.

Then, the Hamiltonian function and the optimal cost function of robot system (7) are defined as below

\begin{array}{l} H (ξ, μ (ξ), \nabla J (ξ)) = & Φ (ξ (τ)) + U (ξ (τ), μ (ξ (τ))) \\ ​ & + {(\nabla J (ξ))}^{T} (f (ξ) + g (ξ) μ (ξ)) \end{array}

J {(ξ)}^{*} = min_{μ \in χ} \int_{t}^{\infty} [Φ (ξ (τ)) + U (ξ (τ), μ (ξ (τ)))] d τ

We can obtain the HJB equation shown as

0 = min_{μ \in χ} H (ξ, μ (ξ), \nabla J^{*} (ξ))

Suppose that the minimum value on the right side of formula (27) exists and also is unique, from $\frac{\partial H (ξ, μ (ξ), \nabla J^{*} (ξ))}{\partial μ} = 0$ , then the following optimal control $μ^{*} (ξ)$ can be derived as

μ^{*} (ξ) = - \frac{1}{2} R^{- 1} g^{T} (ξ) \nabla J^{*} (ξ)

Substituting the optimal control law (28) into equation (24) yields another form of HJB equation with respect to $\nabla J^{*} (ξ)$ is obtained as

H (ξ, μ^{*} (ξ), \nabla J^{*} (ξ)) = 0

Inspired by Liu et al.,³⁴ we know that if the optimal function $J^{*} (ξ)$ is assumed to be continuously differentiable, $J^{*} (ξ)$ can be rebuilded by RBFNN which can be shown as below

J^{*} (ξ) = w^{T} S (ξ) + ε (ξ)

where $w \in ℝ^{l}$ represents the ideal constant weight, $S : ℝ^{2 n} \to ℝ^{l}$ denotes the activation function, l denotes the node number in the hidden layer and $ε (ξ)$ denotes the unknown approximation error of NN. Then, the derivation of equation (30) involving ξ is derived as

\nabla J^{*} (ξ) = (\nabla S (ξ {))}^{T} w + \nabla ε (ξ)

From equations (28) and (31), the following $μ^{*}$ can be obtained as

μ^{*} (ξ) = - \frac{1}{2} R^{- 1} g^{T} (ξ) ((\nabla S (ξ {))}^{T} w + \nabla ε (ξ))

Then, substituting equations (31) and (32) into equation (29), we have

\begin{array}{l} H^{*} (ξ, μ^{*} (ξ), \nabla J^{*} (ξ)) = & Φ (ξ) + w^{T} \nabla S (ξ) f (ξ) \\ ​ & - \frac{1}{4} w^{T} \nabla S (ξ) D \nabla S {(ξ)}^{T} w \\ ​ & + e_{c} = 0 \end{array}

where

e_{c} = (\nabla ε (ξ {))}^{T} (f (ξ) + g (ξ) μ^{*} (ξ))

D = g (ξ) R^{- 1} g {(ξ)}^{T}

In fact, the ideal weight w and $J^{*} (ξ)$ in equation (30) are unknown, then the estimate weight and optimal cost function, respectively, denoted as $\hat{w}$ and $J^{*} (ξ)$ can be obtained by the constructed critic NN. Therefore, the approximate optimal cost $J^{*} (ξ)$ is given as below

\hat{J} (ξ) = {\hat{w}}^{T} S (ξ)

Then, the derivative of equation (36) is

\nabla \hat{J} (ξ) = (\nabla S (ξ {))}^{T} \hat{w}

Based on equations (28) and (37), the approximate optimal control is obtained as

\hat{μ} (ξ) = - \frac{1}{2} R^{- 1} g^{T} (ξ) (\nabla S (ξ {))}^{T} \hat{w}

Similarly, applying equations (25), (37) and (38), the approximate Hamiltonian function $\hat{H} (ξ, \hat{μ} (ξ), \nabla \hat{J} (ξ))$ can be derived as

\begin{array}{l} \hat{H} (ξ, \hat{μ} (ξ), \nabla \hat{J} (ξ)) = & Φ (ξ) + {\hat{w}}^{T} \nabla S (ξ) f (ξ) \\ ​ & - \frac{1}{4} {\hat{w}}^{T} \nabla S (ξ) D {(\nabla S (ξ))}^{T} \hat{w} \end{array}

Define e_H as the error between $H^{*}$ and $\hat{H}$ , $\tilde{w}$ as the approximate NN weight error, then they are shown as below

\begin{array}{l} e_{H} = & \hat{H} (ξ, \hat{μ} (ξ), \nabla \hat{J} (ξ)) \\ ​ & - H^{*} (ξ, μ^{*} (ξ), \nabla J^{*} (ξ)) \end{array}

\tilde{w} = w - \hat{w}

According to equations (33), (39) and (41), e_H in equation (40) can be described as

\begin{matrix} e_{H} = \hat{H} (ξ, \hat{μ} (ξ), \nabla \hat{J} (ξ)) \\ = - {\tilde{w}}^{T} \nabla S (ξ) f (ξ) + \frac{1}{2} {\tilde{w}}^{T} \nabla S (ξ) D (\nabla S (ξ {))}^{T} w \\ - \frac{1}{4} {\tilde{w}}^{T} \nabla S (ξ) D (\nabla S (ξ {))}^{T} \tilde{w} - e_{c} \end{matrix}

To train RBFNN, an appropriate weight updating law $\hat{w}$ should be designed to both minimize the objective function $E = \frac{1}{2} e_{H}^{2}$ and ensure the approximate optimal weight $\hat{w}$ converge to the ideal weight w. To eliminate the requirement for the initial admissible control law, the weight $\hat{w}$ is tuned according to the standard gradient descent algorithm with an additional stabilizing term. The weight updating law is given as

\begin{array}{l} \dot{\hat{w}} = & - (1 - h) α_{H} (\frac{\partial E}{\partial \hat{w}}) \\ ​ & + \frac{1}{2} h α_{c} (\frac{\partial {(\nabla J_{s} (ξ))}^{T} (f (ξ) + g (ξ) \hat{μ})}{\partial \hat{w}}) \\ = & - (1 - h) α_{H} (\frac{\partial E}{\partial \hat{w}}) + \frac{1}{2} h α_{c} \nabla S (ξ) D \nabla J_{s} (ξ) \end{array}

\begin{array}{l} \frac{\partial E}{\partial \hat{w}} = & e_{H} \frac{\partial e_{H}}{\partial \hat{w}} \\ = & \hat{H} (ξ, {\hat{μ}}^{*} (ξ), \nabla \hat{J} (ξ)) \frac{\partial \hat{H}}{\partial \hat{w}} \\ = & [\nabla S (ξ) f (ξ) - \frac{1}{2} \nabla S (ξ) D \nabla S {(ξ)}^{T} \hat{w}] [Φ (ξ (τ)) \\ ​ & + {\hat{w}}^{T} \nabla S (ξ) f (ξ) - \frac{1}{4} {\hat{w}}^{T} \nabla S (ξ) D \nabla S {(ξ)}^{T} \hat{w}] \end{array}

where $α_{H}$ and $α_{c}$ are the basic learning rate of the standard gradient descent algorithm and the learning rate of the stabilizing term, respectively. h is defined as follows

h = \{\begin{array}{l} 0, if {(\nabla J_{s} (ξ))}^{T} (f (ξ) + g (ξ) \hat{μ}) < 0 \\ 1, else \end{array}

where $J_{s} (ξ)$ is selected as a Lyapunov function candidate which is continuously differentiable. And assume that a positive definite matrix N exists, then the following equation is satisfied

\begin{array}{l} {\dot{J}}_{s} (ξ) & = (\nabla J_{s} (ξ {))}^{T} (f (ξ) + g (ξ) μ^{*}) \\ ​ & = - {(\nabla J_{s} (ξ))}^{T} N \nabla J_{s} (ξ) < 0 \end{array}

It should be noted that $J_{s} (ξ)$ is a polynomial with the state variable and can be chosen appropriately, such as the form $J_{s} (ξ) = \frac{1}{2} ξ^{T} ξ$ .

Stability analysis

In this subsection, we will analyse the stability of the system and give the detailed proof that the approximate error $\tilde{w}$ of the NN weight and the state ξ are convergent.

Theorem 1

Consider the robot system (7) with approximate optimal control (38) and the NN weight updating law (43), then it is concluded that the approximate error $\tilde{w}$ of the NN weight and the state ξ are convergent.

Proof

See the Appendix.

Numerical simulation

Simulation settings

A two-degree-of-freedom (2-DOF) planar manipulator is adopted to verify the proposed control scheme. It is constructed by the robotics toolbox with parameters shown in Table 1.³⁸ The numerical simulation shown in Figure 2 runs on the MATLAB 2018a software where an ode3 solver is chosen with a fixed time step of 0.01 s, simulation time 20 s and other settings remain default. The initial joint position is $q_{0} {= [0.08211, 1.897]}^{T}$ and the user-defined trajectory is $x_{d} (t) = [0.3 exp (- t {), 0.5]}^{T}$ . The environment dynamics is simulated as

- F = C_{E} \dot{x} + G_{E} (x - x_{0})

where C_E , G_E and x ₀ are chosen as $diag (0.1, 0.1)$ , $diag (1.0, 1.0)$ , 0.2, respectively, which are unknown during the simulation. For simplicity and without losing generality, only the trajectory along the x-axis is modified and interfered with the external forces.

Table 1.

Parameters of the robot manipulator.

Parameters	Values
l ₁	0.50 m
$l_{c 1}$	0.25 m
l ₂	0.50 m
$l_{c 2}$	0.25 m
m ₁	5 kg
m ₂	5 kg

Figure 2.

Settings of the numerical simulation.

For the proposed control scheme, parameters are set as below: to calculate the optimal trajectory in equation (13), $Q_{E 1} = 1.0$ , $R_{E} = 1.0$ and $U_{d} = - 0.3$ ; the feedback gain in the inverse kinematics in equation (22), $K_{f} = 30$ and $σ = 1 e - 6$ ; as for the ADP controller, in equations (4), (23), (38) and (43), $Λ = diag (5, 5)$ , $R = diag (0.02, 0.02)$ , $Q = diag (2.0, 2.0)$ , $α_{H} = 0.5$ and $α_{c} = 2.5$ . Besides, an RBFNN is selected to approximate the cost function in equation (23), where $\hat{J} = {\hat{w}}^{T} S (ξ)$ , $S_{i} (ξ) = exp (∥ ξ - c_{rbf} ∥ / σ_{rbf}^{2})$ with $\hat{w} \in ℝ^{9}$ , $S (ξ) \in ℝ^{9}$ , $\hat{w} (0) = 0$ , $σ_{rbf} = 0.55$ , $c_{rbf} \in [- 0.2, 0.0, 0.2] \times [- 0.2, 0.0, 0.2]$ .

Simulation results

In this subsection, two cases will be compared to demonstrate the validity of the proposed scheme. Note that, the environment dynamics of the simulation is not totally consistent with that in equation (9), and x ₀ is unknown. Therefore, two different K_e values are considered and examined. Case 1: the feedback gain $K_{e}^{pro} = [- 0.5367, 0.22840]$ acquired from the proposed scheme, which is different from Case 2: the ideal feedback gain $K_{e}^{opt} = [- 0.4142, 0.6604]$ obtained by calculating offline with the exact values of G_E and C_E (the unknown x ₀ is ignored in this case). For fair comparison, in Case 2, the trajectory will be modified at the time as Case 1.

Simulation results are shown in Figures 3 to 6. Figure 3 shows the modification process of the user-defined trajectory along the x-axis of both cases. It is not until around 4.1 s that the rank condition in equation (16) is satisfied following that the trajectory starts being modified. During the transient process, it can be found that the modified trajectory of Case 2 has a slight oscillation, and this subsequently triggers larger tracking errors compared with Case 1. The steady state and force pair of Case 1 and Case 2 trajectories at 10.28 s are 0.13 m/−0.07 N and 0.14 m/−0.06 N, respectively, which is in line with the time series of the cost function in equation (12) of both cases as shown in Figure 4. From the figure, we can see that after the modification of trajectory, the cost function of Case 1 is smaller than that of Case 2, which implies that in this simulation settings where the actual existence of unknown x ₀ cannot be neglected, the feedback gain obtained from the proposed scheme is more appropriate. Note that, due to the unknown x ₀, the environment dynamics in equation (9) used for the designing of the trajectory modifier differentiates from that in equation (47) used for simulation. Therefore, under this situation, actually neither the K_e of Case 1 nor Case 2 is the optimal one. However, the proposed method still works and regards the dynamics in equation (47) as a linear one with an appropriate feedback gain. This has demonstrated the effectiveness of the proposed admittance control method.

Figure 3.

Simulation results of trajectory modification. (a) Case 1, $K_{e} = K_{e}^{pro}$ and (b) Case 2, $K_{e} = K_{e}^{opt}$ .

Figure 4.

The time series of cost function in equation (12), $Q_{E 1} = 1.0$ and $R_{E} = 1.0$ .

Figure 5.

Modified trajectories corresponding to the different choice of $Q_{E 1}$ in equation (12).

Figure 6.

Simulation results of control performance. (a) Case 1, $K_{e} = K_{e}^{pro}$ and (b) Case 2, $K_{e} = K_{e}^{opt}$ .

Figures 5 to 7 are plotted for analysing the performance of the ADP-based controller. Figure 6 shows the control torques τ and sliding mode surface z of Case 1 and Case 2. On the whole, the proposed scheme tracks the both modified trajectory well, given that only nine neurons are used in the RBFNN, and the control torques are within the physical limitation. Besides, weights convergence can be observed in Figure 7. Note that, because of the introduced additional term $\nabla J_{s}$ , the initial admissible policy requirement is relaxed. Thus, in the simulation we choose the weights w to be zeros, without worrying about the control stability. This can be observed from Figure 6 that despite initial errors are large, they finally converge to zeros after some oscillations. Table 2 shows the feedback gain K_e calculated online using the proposed admittance control under the choices of different $Q_{E 1}$ in equation (12). Its corresponding reference trajectories are shown in Figure 5 where the dashed lines denote the reference trajectories after modification and the solid lines stand for the actual trajectories of the robot end-effector under the control of the proposed ADP controller. Obviously, as the $Q_{E 1}$ is selected larger, the reference trajectories tend to get closer to the user-desired trajectory, which is consistent with the fact that more cost is applied to the modified error X_e . Furthermore, although the reference trajectory varies, the proposed ADP controller is still eventually able to track the input signals with the same set of parameters. These also reflect the effectiveness of the proposed ADP controller and admittance control.

Figure 7.

Weights of the RBFNN. (a) Case 1, $K_{e} = K_{e}^{pro}$ and (b) Case 2, $K_{e} = K_{e}^{opt}$ .

Table 2.

Feedback gains of different $Q_{E 1}$ .

	$Q_{E 1} = 0.8$ $R_{E} = 1.0$	$Q_{E 1} = 1.0$ $R_{E} = 1.0$	$Q_{E 1} = 1.2$ $R_{E} = 1.0$	$Q_{E 1} = 1.4$ $R_{E} = 1.0$	$Q_{E 1} = 1.6$ $R_{E} = 1.0$	$Q_{E 1} = 1.8$ $R_{E} = 1.0$	$Q_{E 1} = 2.0$ $R_{E} = 1.0$
$K_{e 1}$	−0.3625	−0.5367	−0.7205	−0.9055	−1.0870	−1.2620	−1.4300
$K_{e 2}$	0.1090	0.2284	0.3606	0.4968	0.6317	0.7627	0.8884

Conclusion

The optimal control of robots interacting between unknown environment was studied in this article. An ADP-based controller with admittance adaptation was proposed. The unknown environment was regarded as a linear system and a compliant behaviour was guaranteed by the admittance adaptation control. In addition, NN was introduced into ADP controller to ensure trajectory tracking of the robot with minimal cost. The stability of the robot system was proved and simulation studies demonstrated the effectiveness of the proposed control scheme.

Because of the complexity of the robot system, dynamic uncertainties and input constraints such as saturation and dead zone are very common in robot systems, which will not only affect system performance but also may lead to system instability.^24,39,40 Therefore, in the frame of ADP, the optimal control problem with dynamic uncertainties and input constraints will be considered in our future work.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by Engineering and Physical Sciences Research Council (EPSRC) under grant EP/S001913 and Shenzhen Science and Technology Plan Project [JSGG20180507183020876].

ORCID iD

Chenguang Yang

Appendix 1

References

Chen

CLP

. A survey of human-centered intelligent robots: issues and challenges. IEEE/CAA J Autom Sinica 2017; 4(4): 602–609.

Huang

Cao

Xiong

, et al. An echo state Gaussian process-based nonlinear model predictive control for pneumatic muscle actuators. IEEE Trans Autom Sci Eng 2019; 16(3): 1071–1084.

Huang

Wang

Fukuda

. Set-membership-based fault detection and isolation for robotic assembly of electrical connectors. IEEE Trans Autom Sci Eng 2018; 15(1): 160–171.

Chandrasekaran

Conrad

. Human–robot collaboration: a survey. In: SoutheastCon 2015, Fort Lauderdale, FL, USA, 9–12 April 2015, pp. 1–8. IEEE.

Mason

. Compliance and force control for computer controlled manipulators. IEEE Trans Syst Man Cybern 1981; 11(6): 418–432.

Wang

, et al. Reference adaptation for robots in physical interactions with unknown environments. IEEE Trans Cybern 2017; 47(11): 3504–3515.

Huang

, et al. Interval type-2 fuzzy logic modeling and control of a mobile two-wheeled inverted pendulum. IEEE Trans Fuzzy Syst 2018; 26(4): 2030–2038.

Raibert

Craig

. Hybrid position/force control of manipulators. Trans ASME J Dyn Syst Meas Control 1981; 103(2): 126–133.

Hogan

. Impedance control: an approach to manipulation—Part I: theory; Part II: implementation; Part III: applications. Trans ASME J Dyn Syst Meas Control 1981; 107(2): 1–24.

10.

Ott

Mukherjee

Nakamura

. A hybrid system framework for unified impedance and admittance control. J Intell Robot Syst 2015; 78(3): 359–375.

11.

Braun

Petit

Huber

, et al. Optimal torque and stiffness control in compliantly actuated robots. In: 2012 IEEERSJ international conference on intelligent robots and systems, Vilamoura, Portugal, 7–12 October, 2012, pp. 2801–2808.

12.

Cohen

Flash

. Learning impedance parameters for robot control using an associative search network. IEEE Trans Robot Autom 1991; 7(3): 382–390.

13.

Tsuji

Ito

Morasso

. Neural network learning of robot arm impedance in operational space. IEEE Trans Syst Man Cybern 1996; 26(2): 290–298.

14.

Love

Book

. Force reflecting teleoperation with adaptive impedance control. IEEE Trans Syst Man Cybern 2004; 34(1): 159–165.

15.

Uemura

Kawamura

Resonance-based motion control method for multi-joint robot through combining stiffness adaptation and iterative learning control. In: 2009 IEEE international conference on robotics and automation, Kobe, Japan, 12–17 May 2009, pp. 1543–1548.

16.

Gribovskaya

Kheddar

Billard

Motion learning and adaptive impedance for robot control during physical interaction with humans. In: 2011 IEEE international conference on robotics and automation, Shanghai, China, 9–13 May 2011, pp. 4326–4332.

17.

Stanisic

Fernández

. Adjusting the parameters of the mechanical impedance for velocity, impact and force control. Robotica 2012; 30(4): 583–597.

18.

Landi

Ferraguti

Sabattini

, et al. Admittance control parameter adaptation for physical human-robot interaction. In: 2017 IEEE international conference on robotics and automation (ICRA), Singapore, Singapore, 29 May–3 June 2017, pp. 2911–2916.

19.

Yao

Zhou

Wang

, et al. Sensorless and adaptive admittance control of industrial robot in physical human–robot interaction. Robot CIM-Int Manuf 2018; 51: 158–168.

20.

Cervantes

Alvarez-Ramirez

. On the PID tracking control of robot manipulators. Syst Control Lett 2001; 42(1): 37–46.

21.

Rosen

. Neural PID control of robot manipulators with application to an upper limb exoskeleton. IEEE Trans Cybern 2013; 43(2): 673–684.

22.

Huang

Zhang

, et al. High-order disturbance observer based sliding mode control for mobile wheeled inverted pendulum systems. IEEE Trans Ind Electron 2020; 67(3): 2030–2041.

23.

Yang

Teng

, et al. Global adaptive tracking control of robot manipulators using neural networks with finite-time learning convergence. Int J Control Autom Syst 2017; 15(4): 1916–1924.

24.

Dong

Sun

. Adaptive neural impedance control of a robotic manipulator with input saturation. IEEE Trans Syst Man Cyber: Syst 2016; 46(3): 334–344.

25.

Slotine

JJE

. On the adaptive control of robot manipulators. Int J Robot Res 1987; 6(3): 49–59.

26.

Fukao

Nakagawa

Adachi

. Adaptive tracking control of a nonholonomic mobile robot. IEEE Trans Robot Autom 2000; 16(5): 609–615.

27.

Huang

, et al. A disturbance observer based sliding mode control for a class of underactuated robotic system with mismatched uncertainties. IEEE Trans Autom Control 2019; 64(6): 2480–2487.

28.

Parra-Vega

Arimoto

Liu

, et al. Dynamic sliding PID control for tracking of robot manipulators: theory and experiments. IEEE Trans Robot Autom 2003; 19(6): 967–976.

29.

Zhang

Dong

Ouyang

, et al. Adaptive neural control for robotic manipulators with output constraints and uncertainties. IEEE Trans Neural Netw Learn 2018; 29(11): 5554–5564.

30.

Yang

Peng

, et al. Neural networks enhanced adaptive admittance control of optimized robot–environment interaction. IEEE Trans Cybern 2019; 49(7): 2568–2579.

31.

Werbos

. Approximate dynamic programming for real-time control and neural modeling. New York: Van Nostrand Reinhold, 1992.

32.

Wang

Zhang

Liu

. Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 2009; 4(2): 39–47.

33.

Wang

Liu

. Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems. IEEE Trans Autom Sci Eng 2014; 11(2): 627–632.

34.

Liu

Wang

, et al. Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems. IEEE Trans Cybern 2014; 44(12): 2834–2847.

35.

Wang

Liu

, et al. Neural network learning and robust stabilization of nonlinear systems with dynamic uncertainties. IEEE Trans Neural Netw Learn Syst 2018; 29(4): 1342–1351.

36.

Jiang

. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 2012; 48(10): 2699–2704.

37.

Siciliano

. A closed-loop inverse kinematic scheme for on-line joint-based robot control. Robotica 1990; 8: 231–243.

38.

Corke

. Robotics, vision and control, Vol. 118. Berlin: Springer, 2017.

39.

Sun

Liang

, et al. Adaptive control for pneumatic artificial muscle systems with parametric uncertainties and unidirectional input constraints. IEEE Trans Ind Inform 2020; 16(2): 969–979.

40.

Sun

Fang

, et al. Nonlinear antiswing control for crane systems with double-pendulum swing effects and uncertain parameters: design and experiments. IEEE Trans Autom Sci Eng 2018; 15(3): 1413–1422.

Adaptive dynamic programming-based controller with admittance adaptation for robot–environment interaction

Abstract

Keywords

Introduction

Preliminaries and problem formulation

Robot dynamics

Environment dynamics

Control scheme

Trajectory modification using admittance control

Inverse kinematics using CLIK

Optimal control using ADP

Stability analysis

Theorem 1

Proof

Numerical simulation

Simulation settings

Simulation results

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

Appendix 1

References