Abstract
In this article, a model-free decentralized sliding mode control method is proposed based on adaptive dynamic programming algorithm to solve the problem of optimal trajectory tracking control of modular and reconfigurable robots. The dynamic model of modular and reconfigurable robot is formulated by a synthesis of joint subsystems with interconnected dynamic couplings. Based on sliding mode control technique, the optimal control problem of the modular and reconfigurable robot systems is transformed into an optimal compensation issue of unknown dynamics of each joint subsystems, in which the interconnected dynamic couplings effects among the subsystems are approximated by using the developed neural network identifier. Based on policy iteration scheme and the adaptive dynamic programming algorithm, the Hamilton–Jacobi–Bellman equation can be solved by using the critic neural network, so that optimal control policy can be obtained. The closed-loop system is proved to be asymptotically stable by using the Lyapunov theory. Finally, simulation results are provided to demonstrate the effectiveness of the method.
Keywords
Introduction
Modular and reconfigurable robots (MRRs) 1 have attracted wide attention in robotics community since they are possessed of better structural adaptability and flexibility than conventional robots. Until now, MRRs have wide applications in many extreme occasions, such as space explorations, disaster assistance, hazard survey, and medical assistance. Correspondingly, appropriate control systems are required to guarantee the accuracy and efficiency of MRRs in the face of different tasks.
As a useful tool to deal with disturbances, sliding mode control (SMC) technique may effectively improve the robustness of the nonlinear systems. There are some insightful papers that address the stabilization and tracking control problems by using SMC technique. Saleh and Fairouz 2 investigated a robust adaptive second-order SMC method for tracking problem of a class of uncertain linear systems with matched and unmatched disturbances. Donya and Saleh 3 proposed an adaptive super-twisting decoupled terminal SMC technique for a class of fourth-order systems. Saleh and Fairouz 4 offered an adaptive global second-order sliding surface for perturbed dynamical systems with matched and unmatched external disturbances. Moreover, SMC has been widely used to design the controllers of manipulators. A sliding mode robust control method was presented for the pan-tilt joint manipulator. 5 A sliding mode adaptive neural network (NN) control was presented for the nonholonomic mobile manipulator. 6 Some investigations have presented to address the problems of manipulators,7–9 and these methods are further implemented for controlling MRR systems. A stable adaptive fuzzy SMC method was proposed for an MRR to satisfy modular software. 10 Slotine and Sastry 11 applied SMC technique on 2-degree-of-freedom (DOF) rigid MRRs to deal with the problem of tracking time-varying reference trajectories. Ficola and Cava 12 presented an SMC method with two sliding surfaces, which gave the application on controlling two-joint MRR. However, the above-mentioned methods have not considered the problem of improving the efficiency of MRRs. Indeed, MRRs are always utilized in extreme occasions without external power supply; therefore, an ideal controller for MRRs should be possessed of the properties that guarantee the robustness of robotic systems and simultaneously consider the optimality of the composite of output power and error characteristics.
Optimal control, which was developed about five decades ago by Bellman 13 and Pontryagin, 14 and hitherto many practical and theoretical results have been represented.15–18 For the perspectives of mathematics, optimal control problem is addressed by minimizing the desired performance index, and then, the solution can be acquired by addressing Hamilton–Jacobi–Bellman (HJB) equation approximatively as it gives the sufficient conditions for the optimality. For addressing the tracking problem of wheeled mobile robot, Lewis and Syrmos 18 investigated a near tracking control method based on receding-horizon dual-heuristic programming algorithm. Based on reinforcement learning theory, Bhsin et al. 19 addressed the optimal coordination control problem of multiple robots when dealing with targets with desired trajectories. Nageshrao et al. 20 proposed an optimal passive control for the 2-DOF manipulator by using the energy balance theory. Tang et al. 21 proposed a learning-based adaptive optimal control method, which is used to solve the tracking problem of n-link robots. The mentioned methods above are all belong to the centralized control scheme. Indeed, an important property of MRRs is that their modules can be replaced, removed, appended, and optionally without adjusting control parameters of others, so that there exists physical restrictions on information interchange among the joint modules of the robotic system. Unfortunately, this kind of restrictions makes it impossible to adopt a centralized control method for MRR systems. To deal with the drawbacks of the conventional centralized control schemes, Li et al. 22 presented a decentralized robust control method for MRRs based on self-tuned feedback gain. In our previous works, we also paid attention to investigate decentralized robust control, 23 decentralized trajectory tracking control, 24 decentralized fault-tolerant control, 25 and decentralized force/position control 26 for MRR systems. However, the mentioned control methods for MRRs are not considered the optimal implementation of the controllers, which can guarantee the stability of robotic systems and simultaneously ensure the optimality of composite of error characteristics and output energy efficiency. Some researchers are presented by combining adaptive dynamic programming (ADP)-based optimal method with decentralized control scheme. Based on ADP theory, Bian et al. 27 presented a decentralized optimal controller based on unmatched uncertainties. Zhao et al. 28 proposed a proportional–integral (PI) algorithm–based decentralized scheme for large systems with mismatched interconnections. Dong et al. 29 addressed the optimal control problems of MRRs by combining the model-based compensation control with ADP-based learning control, and their researches are further expanded to deal with the optimal tracking control issues of MRRs with uncertain environments. 30 The ADP-based decentralized methods are proposed to solve the stabilization control problem of complex robot manipulators and nonlinear systems. However, to the best of authors’ knowledge, there are few researches concentrated on dealing with model-free optimal decentralized SMC of manipulator systems, especially for MRR systems.
In this article, a model-free decentralized SMC method is presented for MRR system via PI scheme and ADP methods. First, the dynamic formulation of MRRs is composed as a synthesis of joint subsystems with interconnected dynamic coupling (IDC) effects. Then, based on SMC technique, the optimal control problem has been transformed into an optimal compensation issue of subsystem unknown dynamics. A decentralized control strategy is designed, which uses only local dynamic information of each joint module, in which subsystem dynamic model is completely unknown. Based on the ADP method and the PI algorithm, the HJB equation is solved by using a critic NN, and then, the optimal control policy can be derived. According to the Lyapunov theory, the closed-loop MRR systems are proved to be asymptotically stable. Finally, simulations are represented to verify the advantage and effectiveness of the proposed method.
The main contributions of this article can be summarized as follows:
To the best of our knowledge, it is the first time to extend the ADP approach to address the model-free decentralized optimal control problem of MRR systems. The proposed scheme can be utilized to MRRs with different configurations and different environment without changing control parameters. Unlike the conventional ADP methods that use action NN and critic NN, in this research, the optimal control policy is obtained by using only critic NN, and the training of action NN is no longer needed. It infers that the computational burden can be reduced effectively.
Unlike the existing methods that consider the IDC effects, a kind of system disturbance with known upper bounds, in this article, the IDC effects, which are with larger order of magnitudes than some other system dynamics, are addressed independently and based on target by using the developed NN identifier-based compensation control law.
Problem statement
The dynamic formulation of robot system with n-DOF can be formulated as follows
where
In the practical application, such as space exploration or disaster rescue, MRR consists of many joint modules, which brings the problem of heavy computational burden and complex control structure. In order to address this drawback, we consider each robotic joint as a subsystem of whole MRR system, which contains IDCs among each subsystem. The ith subsystem dynamic model is expressed as
where
Let
where
where
Assumption 1
The desired position
For getting rid of the norm-boundedness assumption of IDC, desired states of coupled subsystems are used to instead of actual ones. IDC term can be written as
where
where
Remark 1
The system dynamic terms
Accordingly, in the following section, a model-free decentralized optimal SMC method is presented for MRRs to ensure that the closed-loop systems are asymptotically stable.
Model-free decentralized optimal SMC based on ADP
Derivation of the optimal SMC scheme
Define the joint position tracking error as
Then, the time derivative of equation (5) can be obtained as
According to the frame of the SMC method, we can define the sliding surface as follows
where
According to equations (7) and (8), the objective of the SMC is to satisfy the relation
Therefore, the control objective can be transformed to design the optimal SMC law
where
It is noted that the decentralized control
Definition 1
In equation (4), decentralized control
where
Define the Hamilton function and the optimal performance index function as follows
Under the framework of optimal control design, one obtains that
If the
Rewriting
Combining equations (13) and (10), we get
Next, an identifier-based controller is used to compensate IDCs.
Identification of the IDC
In this section, an identifier is presented to approximate the term
Assumption 2 31
NN activation function
Assumption 3 32
The NN approximation error has been upper bounded by the unknown constant.
According to the above assumptions,
where
The NN identifier, which is used to approximate equation (17), is as follows
where
where
where
where
By using equations (21) and (22), one obtains that
where the weight update law of equation (24) is given as follows
where
where
in which
where
where
where
The auxiliary function
in which
Theorem 1
The IDC term is indicated in equation (16) and the dynamic system is developed in equation (17). The utilization of the NN identifier presented in equation (18) and the weight update law given in equation (24) can guarantee the asymptotic identification of the IDCs in the sense that
provided
Proof
Define the Lyapunov function candidate
which satisfies the following relations
where
Define the time derivative of equation (34) as follows
Canceling the common terms in equation (37), denoted as
If equation (33) is satisfied, equation (38) can be written as
where
Let
The region of attraction in equation (40) can be adjusted arbitrarily big to contain any initial conditions. Then, we can get that
while
As per the definition of
According to equations (18), (19), and (20), we can design
where the weight
Critic NN implementation
For finding the optimal control of the MRR (equation (2)), we need to address HJB equation (13) for
The index function is highly non-analytic and nonlinear, the critic NN can be used to approximate
where
where
For system (2), combining equations (36) and (44), we get
Substituting equation (44) into equation (13), the Hamilton function can be reformulated as
where
where
The partial derivative of
Then, one can obtain the approximate Hamilton function as
We use the objective function
where
Define the weight approximation error as
Then, according to equations (46), (49), and (50), we conclude that
The dynamics of the weight approximation error can be given as
According to equations (40) and (44), the desired optimal control policy can be formulated as
and it can be approximated as
Note that the expression of equation (55) is obtained using the critic NN only and the training of action NN is no longer needed. It infers that the computational burden can be reduced effectively.
Theorem 2
Consider the cost function in equation (43), which is approximated by the single-layer critic NN, and the estimated cost function in equation (47) that is built by
Proof
Choose the Lyapunov function candidate as
The time derivative of
Hence, we can get
Combining equations (42) and (55),
Theorem 3
Consider the n-DOF MRR system, with the subsystem dynamic model as the form of equation (2), which is completely unknown while designing the optimal controller. If the decentralized optimal SMC (equation (57)) is adopted for the MRR system, then the closed-loop robotic system is asymptotically stable.
Proof
Choosing the Lyapunov candidate function
According to equations (32), (42), and (53), the time derivative of
As
where
where
Moreover, equation (61) means that
Remark 2
To solve the difficulty of addressing HJB equation, a local PI method is introduced by referring previous literature.33–35 The iterative procedure of PI method with equation (9) is written in Appendix 1.
Simulations
Simulation setup
In this section, two MRRs are given for simulation. For configuration A, the dynamic formulation is given as
and the ones for configuration B is given as
The desired trajectories of both configurations A and B are written as follows
The NN weight vector estimation can be written as
Simulation results
The simulation results are represented to improve the effectiveness of control torques, joint position tracking, the convergence of NN weights, and position tracking errors. The host computer CPU used in this article is Intel Core i7-7700 with 8.00 GB RAM, 3.60 GHz, and the suite of required software is MATLAB 2016a running in Windows 10. Two different control methods are used in the simulations that contain the existing NN-based optimal control method, such as the study by Dong and colleagues,23,24 and the proposed ADP-based decentralized optimal SMC method.
Figure 1 shows the joint position tracking curves under the existing method. One can observe that the chattering effect is obvious at the first 2 s that is caused without dynamic decomposition and optimal compensation of the IDCs. Figure 2 illustrates the joint position curves of configuration A joint 1 under the proposed method. Comparing with Figure 1, we can obtain that the desired trajectories can be tracked within a very short time period, since the proposed method can verify the effectiveness. Figures 3 and 4 are joint position tracking error curves of configuration A under the existing method and proposed method. In Figure 3, we can obtain that the position errors are obvious at few seconds and the amplitude of steady-state error can be ±0.02 rad. From Figure 4, the position errors are tracking very fast and nearly at 0.

Position tracking for configuration A with existing methods: (a) Joint 1 and (b) Joint 2.

Position tracking for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.

Position error for configuration A with existing methods: (a) Joint 1 and (b) Joint 2.

Position error for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.
Figures 5 and 6 are the velocity tracking curves under the existing method and the proposed method of configuration A, respectively. Figures 7 and 8 illustrate the velocity error curves under the existing method and the proposed method of configuration A. Because the existing methods have not considered the compensation problem of the IDC effects, the tracking error is larger than the method that we proposed.

Velocity tracking for configuration A with existing methods: (a) Joint 1 and (b) Joint 2.

Velocity tracking for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.

Velocity error for configuration A with existing methods: (a) Joint 1 and (b) Joint 2.

Velocity error for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.
Figure 9 illustrates the control torque curve with the existing method of configuration A. For joint 1, it can be observed that the initial control torque is large and may bring the burden to the motors. That is because the NNs need time to learning for supply the big torque. For joint 2, it can be seen that at some time, the control torque has a sudden vibration. That is because the method without IDCs compensation. Figure 10 illustrates the control torque curve of configuration A with the proposed scheme. The output torques have been optimized with an appropriate behavior that may match up the output power of motors.

Control torque for configuration A with existing methods: (a) Joint 1 and (b) Joint 2.

Control torque for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.
Figure 11 shows the critic NN curves of configuration A joints 1 and 2 under the proposed method. Because of the implementation process of PI and critic NN training, the convergence results of weights can be obtained before 1 s. The weights of the critic NN converge to

Critic NN for configuration A with proposed methods: (a) Joint 1 and (b) Joint 2.
Figures 12–22 represent the trajectory tracking curves, position error curves, velocity curves, velocity error curves and control torque, and convergence results of weights for configuration B. We can obtain the similar results. The conclusion can be received comparing with configuration A. It improves the proposed method without the requirements of adjusting parameters. The weights of critic NN converge to

Position tracking for configuration B with existing methods: (a) Joint 1 and (b) Joint 2.

Position tracking for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.

Position error for configuration B with existing methods: (a) Joint 1 and (b) Joint 2.

Position error for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.

Velocity tracking for configuration B with existing methods: (a) Joint 1 and (b) Joint 2.

Velocity tracking for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.

Velocity error for configuration B with existing methods: (a) Joint 1 and (b) Joint 2.

Velocity error for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.

Control torque for configuration B with existing methods: (a) Joint 1 and (b) Joint 2.

Control torque for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.

Critic NN for configuration B with proposed methods: (a) Joint 1 and (b) Joint 2.
For the simulations, one can obtain the decentralized optimal SMC method and can guarantee stability and accuracy.
Conclusion
This article proposes a model-free decentralized SMC method for MRR system via PI scheme–based ADP. First, the dynamic formulation is expressed by a synthesis of joint subsystems with IDC effects. Then, based on the SMC technique, the decentralized optimal control problem is transformed into the optimal compensation issue of unknown dynamics of each subsystem. Based on ADP and the PI theory, the HJB equation can address by using a critic NN and one can obtain the optimal control policy. According to the Lyapunov theory, the closed-loop MRR systems are guaranteed UUB. Finally, simulations are verified by the effectiveness of the method.
As is known, ADP-based control methods have been successfully used to address the optimal control problems in battery management, residential energy management, water–gas shift reaction, and coal gasification process in theory. However, the effectiveness analysis of the mentioned works all relies on numerical analyses and simulation results. Indeed, a drawback of the researches of ADP-based optimal control methods, in a common view, is the development of experimental researches and practical applications for physical systems, especially for robotic systems. How to address the problems in establishing an experimental platform of robotic systems that satisfy the real-time and accurate requirements is a key problem for implementing the proposed decentralized SMC method to actual MRR systems, and it is also our future research topic.
Footnotes
Appendix 1
Handling Editor: James Baldwin
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China (grant nos 61374051, 61773075, and 617030555), the Scientific Technological Development Plan Project in Jilin Province of China (grant nos 20160520013JH, 20190103004JH, and 20160414033GH), and Project of the Engineering Laboratory of Intelligent Robot and Vision Measurement and Control Technology in Jilin Province (2019C010).
