Sage Journals: Discover world-class research

Abstract

Polishing robot is an automatic system in which the robot controls the end effector to fix the polishing tool and finish the workpiece polishing efficiently. In order to solve the problem of how to maintain the stability of actuator contact force in the robot automatic polishing system, a learning algorithm of robot impedance control parameters based on reinforcement learning is proposed and the impedance control model is established in this paper. The influence parameters (inertia M, damping B, stiffness K) of impedance performance are analyzed by numerical simulation method and the optimized impedance parameters are obtained at last. Due to the small number of iterations and high data utilization rate, reinforcement learning algorithm is more suitable for robot constant force tracking. In the process of applying reinforcement learning algorithm, a combination of dynamic matching method and linearization method is proposed to predict the output distribution of the state, which greatly improves the cost function of the evaluation strategy, and impedance parameters corresponding to the optimal strategy are obtained. Finally, steam turbine blade is taken as polishing test part. The average roughness of the selected points of test part after polishing is only 0.302μm, and much less than 1.151μm before polishing, which verifies the feasibility of the proposed impedance control method.

Keywords

Industrial robot Surface polishing Impedance control Reinforcement learning

Introduction

In the face of increasingly high machining requirements and complex polishing tasks, robot is widely used in the field of surface polishing.¹ Yousedizadeh et al.² designed an automatic on-line trajectory generator based on operation intention detection. The designed impedance controller enables the robot to actively comply, ensuring the stability of force in processing and the safety of the operator. Du et al.³ developed an automatic polishing system and designed a flexible end effector, which can control the contact force perfectly.

In order to control the strong vibration generated by the rotating robot in the polishing process, Dong et al.⁴ proposed a method for detecting and controlled the contact force with internal sensors, which can well suppress strong interference. Ding⁵ proposed a hybrid control method of polishing force and position for complex surface parts, and it can achieve stable force control and accurate position control. Zhang et al.⁶ proposed a reasonable control system, which can realize constant force polishing in mass production of industrial robots.

The design idea of robot impedance control is to establish the dynamic relationship between the end force of the robot and its position deviation, so as to control the end force of the robot by controlling the joint displacement of the robot.⁷ Due to the high robustness and simple implementation of impedance control, scholars from home and abroad are committed to the application of impedance control in robot industry.^8–10 However, the traditional impedance control cannot track force accurately. Due to the complexity of robot polishing and high cost of processing information acquisition, robot polishing is consistent with the feature that reinforcement learning process needs less information.

In recent years, with the rapid development and application of machine learning algorithm, the combination of robot impedance control strategy and machine learning algorithm has been further developed. Reinforcement learning algorithm can find high-return strategies only through the reward function, that is to say by trial and error method, which provides a new way for robots to achieve complex autonomous compliance control. Reinforcement learning algorithm can dynamically adjust the motion state of the robot.^11–13 Du et al.¹⁴ proposed a variable admittance control method based on fuzzy reinforcement learning. By dynamically adjusting the variable virtual damping term in the admittance controller, the position control accuracy of the robot is enhanced and the required energy is reduced. Li et al.¹⁵ proposed a learning variable impedance robot control method. Simulation experiments show that several interactions can successfully complete the learning task of force control, which is suitable for the force-sensitive tasks.

At present, the impedance controller of industrial robot is mainly used in the fields of drilling, grinding, deburring, and simple assembly. The research and application of force control strategy for machining small complex curved parts at home and abroad cannot solve the problem. Therefore, the turbine blade is taken as polishing object in this paper. The robot impedance control strategy and application technology based on reinforcement learning algorithm are studied in the surface polishing process. The robot surface polishing experimental system is designed and verified.

The paper is organized as follows. Sec. 2 provides the impedance control method of polishing robot and the influence of impedance control parameters on control performance. In Sec. 3, the state-transition probability model of the impedance control robot is established by the Gauss process model, and the output distribution of the model is predicted by the improved algorithm of dynamic matching and linearization. Sec. 4 provides the feasibility of the algorithm, which is verified by simulation and experiment. Sec. 5 summarizes the benefits and the limitations of the study and the potential future research directions.

Process analysis of surface polishing robot

Modeling of surface polishing robot

The joint coordinate system of surface polishing robot is established in Figure 1, and the parameters of polishing robot are listed in Table 1, Where $θ_{i}$ is the joint angle, $d_{i}$ is the offset of the connecting rod, $a_{i}$ is the length of the connecting rod, and $α_{i}$ is the torsion angle of the connecting rod.

Figure 1.

Joint coordinate system of polishing robot.

Table 1.

Motion parameters of polishing robot.

Robot joint	$θ_{i}$ (°)	$d_{i}$ (mm)	$a_{i}$ (mm)	$α_{i}$ (°)
1	$θ_{1}$	675	0	0
2	$θ_{2}$	0	260	−90°
3	$θ_{3}$	0	680	0
4	$θ_{4}$	670	−35	−90°
5	$θ_{5}$	0	0	−90°
6	$θ_{6}$	454	0	−90°

The linkage model and motion model of robot are established by D-H method. The forward kinematics is solved and analyzed. The inverse kinematics is solved by algebra method, and the inverse solutions of multiple groups are optimized. The simulation of robot motion model is carried out in MATLAB robotics toolbox, which verifies the correctness and rationality of kinematics analysis.¹⁶

Impedance control method of surface polishing robot

The robot can adjust the polishing trajectory in the polishing coordinate system dynamically according to the end contact force. The purpose of impedance control is to establish the dynamic response model of robot position and contact force for industrial robots. Impedance control methods can be divided into position-based impedance control and moment-based impedance control.¹⁷ In the actual polishing process of robot, the impedance control method based on position is simple and robust. Therefore, the impedance control method based on position is adopted in this paper. Impedance represents the relationship between end contact force and trajectory deviation for robot. Its mathematical expression is:

\begin{matrix} M_{d} [\overset{\cdot\cdot}{X} (t) - {\overset{\cdot\cdot}{X}}_{d} (t)] + B_{d} [\overset{\cdot}{X} (t) - {\overset{\cdot}{X}}_{d} (t)] \\ + K_{d} [X (t) - X_{d} (t)] = F_{e} \end{matrix}

(1)

Where $M_{d}$ , $B_{d}$ , and $K_{d}$ represent the inertia matrix, damping matrix and stiffness matrix of the expected impedance model respectively. The force signal deviation $F_{e} = F - F_{d}$ can be obtained by introducing expected force signal $F_{d}$ into impedance control. According to position-based impedance control method, robot motion in free space and constraint space can be expressed uniformly. When robots move in free space, the contact force between the robot and the workpiece, $f_{e} = 0$ , the contact force does not require position correction. When robots move in constrained space, $f_{e} \neq 0$ , the contact force needs to be corrected. The deviation between the contact force and the expected contact force is processed by the impedance controller to produce a position correction. The terminal trajectory of the robot can be corrected in time sequence by adding the position correction to the target position. The position correction can be expressed as:

e_{e} = (e_{x}, e_{y}, e_{z})

(2)

The corrected position can satisfy with the dynamic relationship between the force and position of robot. In order to track the polishing force, the force deviation is used to replace the contact force in the impedance expression. The position correction which is generated by the impedance controller can ensure that the robot always maintains a constant normal contact force during the blade polishing process. Position-based impedance control flow chart is represented in Figure 2.

Figure 2.

Position-based impedance control flow chart.

Because the robot moves in three-dimensional space, the impedance simulation system model of surface polishing robot is established in Simulink software through the formula and the motion model of the robot, as shown in Figure 3.

Figure 3.

Simulation model of surface polishing robot impedance control.

Influence of impedance control parameters on control performance

Suitable target impedance parameters can improve the robustness of the control system and the adaptability to the environment for robot impedance control. In order to study the influence of three target parameters (inertia $m_{d}$ , damping $b_{d}$ , and stiffness $k_{d}$ ) of robot impedance control, the impedance control simulation model of curved surface polishing robot established above is used, and the single channel simulation of impedance control is carried out by using the control variable method, that is, keeping the two impedance parameters unchanged and studying the third parameter.

The simulation curve of contact force with time is shown in Figure 4. When damping parameter $b_{d} = 400$ , stiffness parameter $k_{d} = 1$ , and inertia parameter $m_{d}$ is only changed.

Figure 4.

Influence of inertial parameters on system performance.

The simulation curve of contact force with time is shown in Figure 5. When inertia parameter $m_{d} = 50$ , stiffness parameter $k_{d} = 1$ , and damping parameter $b_{d}$ is only changed.

Figure 5.

Influence of damping parameters on system performance.

The simulation curve of contact force with time is shown in Figure 6. When inertia parameter $m_{d} = 50$ , damping parameter $b_{d} = 300$ , and stiffness parameter $k_{d}$ is only changed.

Figure 6.

Influence of stiffness parameters on system performance.

In Figure 4, the change of inertia parameter $m_{d}$ has little effect on the steady-state error of the system, and it mainly affects the overshoot and response speed of the system. When $m_{d}$ is small, the system response time is short and overshoot is small too. With the increase of $m_{d}$ , overshoot increases and response time becomes longer. The inertia parameter $m_{d}$ reflects the change of acceleration term of the end effector in the polishing process. When the mass of the end effector is small, its inertia is small, so the actuator can respond quickly. When the mass of the end effector increases, the inertia increases and the impact on the workpiece increases, so the overshoot increases and the stabilization time increases correspondingly. Therefore, the inertia parameters should be selected according to the actual mass of the moving part of the end effector.

In Figure 5, $b_{d}$ mainly affects the overshoot and stability time of the system, and it has little effect on the steady-state error. When the damping coefficient $b_{d}$ is small, it takes a long time for the system to reach steady state, and the overshoot and system oscillation are large. Therefore, the damping parameter $b_{d}$ needs to be selected according to the polishing system environment.

In Figure 6, the stiffness parameter $k_{d}$ has a great influence on the steady-state error of the system. The stiffness coefficient reflects the rigidity of the robot when it contacts with the workpiece. The contact force between the robot and the workpiece can be controlled by adjusting the stiffness parameter, so as to meet the requirements of polishing operation.

The steady-state error of the system contact force mainly depends on the target stiffness parameters $k_{d}$ , environment stiffness $k_{e}$ , and environment position $x_{e}$ . In the robot polishing process, the most suitable $k_{d}$ , $k_{e}$ , and $x_{e}$ are unknown, and difficult to measure directly. For this reason, reinforcement learning method is used to train the impedance control algorithm to obtain the appropriate reference value.

Parameter optimization of impedance control based on reinforcement learning

Learning process of impedance parameters based on reinforcement learning

Reinforcement learning is mainly used to solve sequential decision problems.¹⁸ The improved reinforcement learning algorithm for impedance control is mainly divided into three layers. The bottom layer uses the existing data to learn the state-transition probability model. The input of the bottom layer is the state of the Experiment I (including polishing force-position, desired polishing force-position, and corresponding impedance parameters), and the output is the system state-transition probability model fitted by Gaussian process. The middle layer is to predict the output state distribution according to the model, that is to say, the probability distribution of different state-action pairs (the state at time t transfers to the state at time t + 1). The top level is responsible for strategy evaluation and optimization. The cost function is used to evaluate the strategy in the long-term polishing process. The gradient-based strategy search method is used to obtain the optimal strategy and impedance control parameters.

Firstly, the state-transition probability model of the algorithm is established based on the sample data. Due to the uncertainty of machining process, Gaussian process model is used to describe the state-transition probability of robot polishing process. The Gaussian process is completely specified by the mean function $m (\cdot)$ and kernel function $k (\cdot, \cdot)$ .¹⁹

The Gaussian process model is usually assumed that the training model is a normal distribution sample with zero mean value according to the prior probability of the model, so the unique Gaussian process model can be obtained by covariance matrix. The covariance function is the kernel function of the state-transition probability model. Because the time period and the variance of contact force should be as small as possible in the robot polishing process, the prior mean function of Gaussian model is set as $m \equiv 0$ . Kernel function²⁰ is expressed in equation (3) and hyper parameter is expressed in equation (4).

k ({\tilde{x}}_{p}, {\tilde{x}}_{q}) = {σ_{f}}^{2} \exp (- \frac{| d |^{2}}{2 l^{2}})

(3)

θ \in \arg min \frac{\partial L (θ)}{\partial θ_{i}}

(4)

Improved predictive output distribution function

There are two methods to predict the output distribution dynamic matching method and linearization method. The dynamic matching method has better prediction effect when the probability is low, while the linearization method is more suitable for predicting the distribution of high probability states.²⁰ Because the two methods have different prediction effects for different probabilities, an algorithm combining the two methods is proposed, which uses a regulating factor to balance the prediction methods. Due to the lack of empirical data in the early stage, dynamic matching method was used. In the medium term, the algorithm has accumulated experience. At this time, it needs to predict the distribution characteristics accurately, so it is more suitable for the linearization method. In the later stage, in order to ensure that the output distribution can accurately follow the actual state, the dynamic matching method is used for prediction. The improved algorithm flow is as follows:

Step 1: Calculate the action distribution of the previous state $p (π (x_{t - 1}))$ ;

Step2: Calculate the state-action joint distribution of the previous state $p (x_{t - 1}, π_{t - 1})$ ;

Step 3: Calculate adjustment factor according to the characteristic points and probability accumulation of historical state $δ$ ;

Step 4: If $δ \leq 1$ , select dynamic matching method; else, go to step 5;

Step 5: When using the linearization method, in order to avoid the large prediction difference between the dynamic matching method and the linearization method, the prediction correction is added to the prediction results, $p (Δ_{t}) = k \cdot p_{linear} (Δ_{t})$ ;

Step 6: Calculate $p (Δ_{t})$ according to the selected algorithm;

Step 7: Calculate distribution $p (x_{t})$ of the following states.

The regulatory factor $δ$ is:

δ = \frac{c_{dynamic}}{c_{linear}}

(5)

The correction factor is:

k = | 1 - \frac{p_{linear}}{p_{dynamic}} | \int p (Δ_{t}) d Δ_{t}

(6)

In the expression of adjustment factor, $c_{dy namic}$ and $c_{linear}$ are respectively the immediate cost of using dynamic matching method and linearization method to predict the distribution. The correction coefficient includes the cumulative data of the output probability distribution, which can reflect the stage of prediction. The output distribution of the improved algorithm is shown in Figure 7. The average error between the predicted output distribution and the actual distribution is less than 20%.

Figure 7.

The effect of the improved algorithm.

After the output distribution $p (Δ_{t})$ of Gaussian process prediction is obtained, the relationship between the subsequent states and the output distribution can be determined by the state-transition probability model, that is, $Δ_{t} = x_{t} - x_{t - 1} + ε$ . The mean and variance of the subsequent states are given in equations (7) and (8), which will be used as the input of $p (x_{t})$ , and then $p (x_{t + 1})$ will be calculated, and the distribution of all subsequent states can be obtained by iterating in turn.

E_{t} = E_{t - 1} + E_{Δ}

(7)

{var}_{t} = {var}_{t - 1} + {var}_{Δ} + cov (x_{t - 1}, Δ_{t}) + cov (Δ_{t}, x_{t - 1})

(8)

The distribution of all subsequent states is obtained by predicting the output distribution. An index that represents the pros and cons of a strategy need aggregating all state distributions with the corresponding cost function to evaluate each subsequent state. The index of the evaluation strategy is as follows.

V^{π} (x_{0}) = \sum_{0}^{T} \int c (x_{t}) p (x_{t}) dt

(9)

$c (x_{t})$ is the cost function of the corresponding state, which is used as the basis to judge whether the strategy selected from the current state to the target state is the optimal. After the value function $V^{π} (x_{0})$ of the strategy is obtained, strategy optimization method based the gradient is used to select optimal strategy and corresponding optimal parameters.

Simulation experiment and analysis

Steam turbine blade is taken as polishing test part in order to verify effect of the impedance control method. During the polishing possess, the end of the robot arm is in contact with the rigid surface of the profile to be machined of the steam turbine blade. Assuming that the damping and rigidity of the blade rigid surface to be machined are $B_{E} = [150 150 150] Ns \cdot m^{- 1}$ and $K_{E} = [2000 2000 2000] N \cdot m^{- 1}$ respectively, the goal of control is that robot automatically learns to control the end contact force to reach the expected value. The training objectives are expected force and expected position (i.e. the planned tool position for blade polishing). In the process of learning, if the force is stable, $| F_{z} - F_{zd} | \leq 0.5 N$ , and the force overshoot is less than 2 N, the iterative learning is considered to be successful, else it is considered as learning failure. $K_{d} \in [\begin{matrix} 0.1 & 50 \end{matrix}]$ and $B_{d} \in [\begin{matrix} 50 & 300 \end{matrix}]$ are the limits of impedance parameters of polishing robot.¹⁵

In the first experiment, 1000 sets of initial sample data were provided for the algorithm. This sample data were collected in the experiment under the condition of fixed impedance parameters, that is to say, $K_{0}$ and $B_{0}$ were fixed at the initial time. According to the experimental data, the state-transition probability model is updated, and the subsequent state prediction, evaluation, and optimization were carried out to calculate the impedance control parameters. A total of 20 iterations were designed, and the initial state of each iteration is the end state of the previous experiment. The calculated impedance parameters were applied to the impedance model established previously, and 1000 new sets of data were obtained. These data were updated and put into the training set of the GP model for the next iteration. The schematic diagram of the experimental process is shown in Figure 8.

Figure 8.

Flow chart of experiment process.

The pseudo-codes of the impedance control parameter learning process based on reinforcement learning is as follows.

ImpedanceControlParameterLearning()

var init $K_{0}, B_{0}$

while $E_{ss} \geq Δ F_{d}, E_{max} \geq Δ F_{max}$ do

UpdateDataset()

TrainingDataset()

establishProbabilisticDynamicModel()

//Status values and parameter input values are included in X.

//Y is Training target value

train sample X,Y

get $θ$ , $k (x, x)$ //hyperparameter, kernel function then get $p (x_{t})$ // distribution of the following states

ImprovedPredictionAlgorithm()

CostFunction()

// $V^{π} (x_{0})$ is the indicator of the evaluation strategy

// $c (x_{t})$ is the cost function of the corresponding status

$V^{π} (x_{0}) \leftarrow \sum_{0}^{T} \int c (x_{t}) p (x_{t}) dt$

GradientBasedStrategySearchMethod()

getImpedanceController()

getRobotMotionParameters()

end

Impedance parameter adjustment process

In this section, a reinforcement learning-based robot impedance parameter adjustment method is proposed. This method needs to train reinforcement learning agents through the change relationship between the end force and the position of the robot, combine with the expected output (expected force and expected position), and ultimately determine the appropriate impedance parameters. The value of the cost function reflects the difference between the current state and the expected state. The smaller the cost function is, the closer the current state is to the expected state. The change process of the immediate cost in the first experiment is shown in Figure 9. As it is the first time for learning and requires the accumulation of data and information, the predicted cost deviates greatly from the actual cost.

Figure 9.

Immediate cost function value of the first experiment.

Figure 10 shows the change process of the immediate cost in the seventh experiment. Compared with the first experiment, the actual cost of the seventh experiment has been accurately predicted. It shows that the trained algorithm can accurately predict the system state and parameters. As the experiment goes on, the difference between the predicted cumulative cost and the real cumulative cost becomes smaller and smaller.

Figure 10.

Immediate cost function value of the seventh experiment.

Figure 11 shows the stiffness coefficient variation curve in the first, seventh, and twentieth iteration process. The variation curve of the damping parameter is shown in Figure 12. It reveals that the prediction ability of impedance parameters is gradually improved with the increase of learning times, and the parameters are constantly changing. After the seventh iteration, the appropriate impedance parameters are obtained. By adjusting the stiffness and damping parameters dynamically, the stability and flexibility of the system are ensured. In the twentieth experiment, optimal impedance parameters of the robot system are obtained, which are $K = 2.37$ and $B = 161.32$ respectively. At the same time, because of the gradual optimization of impedance parameters in the later stage of iteration, the time (sampling interval × sampling times) required for system stability is also reduced, which optimizes the dynamic performance of the system.

Figure 11.

The change process of stiffness coefficient: (a) the first iteration, (b) the seventh iteration, and (c) the twentieth iteration.

Figure 12.

The change process of damping coefficient: (a) the first iteration, (b) the seventh iteration, and (c) the twentieth iteration.

The key data to obtain stable impedance parameters in the process of parameter training are listed in Table 2. With the progress of learning, the system also has the ability to predict parameters and stabilize quickly.

Table 2.

Key parameters in training process.

Training times	T (settling time)	K (stiffness coefficient)	B (damping coefficient)
1	Instability	Instability	Instability
7	6.12 s	4.45 N/m	188.54 Ns/m
20	2.82 s	2.37 N/m	161.32 Ns/m

Simulation analysis

The force change curve in the learning process is shown in Figure 13. It illustrates that the controller can realize the stable and rapid force tracking after the learning process. Mission 1–6 failed due to excessive overshoot or steady-state error. At the seventh time, the force control task was successfully completed.

Figure 13.

The learning process of force control: (a) the first learning, (b) the sixth learning, (c) the seventh learning, and (d) the twentieth learning.

The mechanical arm of polishing robot can stably track the force after contacting with the blade surface through rapid adjustment from the simulation experiment. After seven times of learning, the end of the manipulator can quickly adjust the end position, so that the end contact force can be stabilized to the desired force in a shorter time. After the seventh learning, the force control performance is close to stability and it can be stabilized quickly to the expected value without overshoot. After 20 times of iterative learning, optimal control strategy is obtained, and the cumulative cost is minimum. At this time, the end contact force of the manipulator only needs 0.76 s to reach stable. Table 3 is the comparison of the impedance parameter optimization algorithm based on reinforcement learning with fuzzy iterative algorithm²¹ and adaptive iterative algorithm.²² It can be seen that the algorithm based on reinforcement learning has more advantages than fuzzy iterative algorithm in terms of iteration times and force error.

Table 3.

Comparison between reinforcement learning and other algorithms.

Algorithm	Force errors/N	Iteration times
Improved reinforcementlearning algorithm	0.258	5
Adaptive iterative algorithm	1.209	15
Fuzzy iterative algorithm	0.535	18

Experimental study

In this section, the polishing system of steam turbine blade was designed. The hardware of the experimental system includes KUKA KR16-2 industrial robot, robot controller, external measurement equipment (three-dimensional force sensor ME3DT120), signal acquisition equipment (signal amplifier and signal acquisition card PCIe-6320), polishing device (connector, motorized spindle, and polishing tool) and host computer, as shown in Figure 14. The hardware system can drive the robot motion, collect the force signal, and drive the motorized spindle. The software system adopts C/S (client/server) architecture. Polishing robot controls the robot movement by instruction program through KRL, and feeds back the parameters such as the end position of the robot to the host computer in real time. By processing the collected force signal and combining with the actual pose of the robot, the host computer calculates the position compensation value by the algorithm. The compensation value is transmitted to the robot through the KUKA-RSI Ethernet interface, so as to realize the stable control of the end polishing force and position.

Figure 14.

System structure of surface polishing robot.

In order to verify the effect of impedance control, machining experiments without impedance control and with impedance control are carried out in the designed experimental system. The machining parameters are selected as follows.²³ The spherical polishing head was used for polishing experiment, and the spindle speed was 6000 r/min. The expected contact force is 5 N, and the diameter of the spherical polishing head is 12 mm, and the diameter of the tool rod of the polishing head is 3 mm, and the spindle feed rate is 2 mm/s. Input the optimized impedance parameters in the control software.

The robot is controlled to move from the zero position to the designated position in the impedance control experiment, and a constant reference force of 5 N is given for tracking. Firstly, the force tracking experiment is carried out without impedance control, and the planned trajectory is directly imported for processing. Then the impedance control is added to the system, and the initial impedance parameters in the polishing experiment are set according to the previous simulation results. Finally, the impedance parameters are adjusted according to the actual effect. Figure 15 is the contact force without impedance control. Figure 16 shows the change of contact force when the impedance parameters $k = 5.24$ and $b = 157.1$ after training.

Figure 15.

Contact force without impedance control.

Figure 16.

Contact force with impedance control.

It shows the change of contact force is unstable without impedance control. The robot always moves according to the planned trajectory because the cutter location point is not adjusted according to the contact force. When the polishing tool begins to contact the workpiece with impedance control, the contact force does not reach the designated expected contact force due to the workpiece error, clamping error, measurement error, and so on. With the progress of impedance control, the contact force mostly remains within $5 \pm 1 N$ , and only a few fluctuation range is greater than 1 N, which also shows that the impedance control is effective to maintain the stability of the end force.

In the polishing experimental system, the inner arc surface of the steam turbine blade is polished to verify the influence of the impedance control algorithm and control parameter optimization method of the polishing robot. The spherical wool felt combined with grinding paste is used to polish the semi polished steam turbine blade. The polishing experiment of steam turbine blade is shown in Figure 17. The main experimental parameters are listed in Table 4.

Figure 17.

Polishing experiment of steam turbine blade.

Table 4.

Experimental parameters of robot polishing.

Parameter item	Parameter value
The diameter of polishing tool (mm)	12
Particle size of grinding paste (mesh)	4000
Spindle speed (r/min)	6000
Stiffness coefficient (N/m)	2.37
Damping coefficient (Ns/m)	161.32

The surface quality of the inner arc surface of the steam turbine blade before and after polishing is shown in Figure 18. TIME210 is selected as instrument of roughness measuring. The average value of the six roughness measurement points before and after polishing is taken to obtain the average deviation value (Ra) of the inner arc surface contour of the steam turbine blade in Table 5. The average surface roughness of the six measuring points is 1.151 μm before fine polishing, and it is 0.302 μm after fine polishing, and there is no obvious texture, which meets the Ra = 0.8 μm requirement of the inner arc surface roughness of the steam turbine blade. The feasibility of the proposed algorithm and the designed polishing system is verified.

Figure 18.

Effect before and after fine polishing of inner arc profile of turbine rotor blade: (a) before fine polishing and (b) after fine polishing.

Table 5.

Average deviation of contour arithmetic of inner arc surface of turbine blade.

	Arithmetical mean deviation of the profile Ra(μm)						Average value Ra (μm)
	1	2	3	4	5	6
Before fine polishing	1.174	1.097	1.162	1.191	1.125	1.156	1.151
After fine polishing	0.219	0.293	0.408	0.215	0.392	0.283	0.302

Conclusion

The impedance control of surface polishing robot and the parameter optimization method based on reinforcement learning are studied in this paper. The impedance control parameter method based on reinforcement learning algorithm is also discussed. The impedance control state-transition model of surface polishing robot is established by Gaussian process method. The improved algorithm combining dynamic matching method and linearization method is used to predict the output distribution of the model and the actual output error is controlled within 20%. The improved reinforcement learning algorithm is used for training impedance parameter, and optimal impedance parameters are obtained and added to the control set. The feasibility of the algorithm is verified by simulation experiment. The experiment results show that the average roughness of the selected points of steam turbine blade after polishing is only 0.302, which verifies the accuracy of the algorithm and the feasibility of the experimental system.

However, there are still some jobs need to be improved and studied. For example, the case of only one type of blade polishing force impedance control method is studied, and the generality of the algorithm is not strong. In the follow-up work, more other types of steam turbine blades will be studied. A lot of in-depth research on the algorithm will be conducted, and more sample data will be obtained to improve the applicable scene of the applicable scene of the algorithm.

Footnotes

Appendix

Notation	Definition
$θ_{i}$	Joint angle
$d_{i}$	The offset of the connecting rod
$a_{i}$	The length of the connecting rod
$α_{i}$	The torsion angle of the connecting rod
$M_{d}$	Inertia matrix
$B_{d}$	Damping matrix
$K_{d}$	Stiffness matrix
$m_{d}$	Inertia parameter
$b_{d}$	Damping parameter
$k_{d}$	Stiffness parameter
$p (π (x_{t - 1}))$	Action distribution of the previous state
$p (x_{t - 1}, π_{t - 1})$	State-action joint distribution of the previous state
$δ$	Adjustment factor
$p (Δ_{t})$	Distribution of current state output
$p (x_{t})$	Distribution of the following states
$c_{dy namic}$	Immediate cost of using dynamic matchingmethod
$c_{linear}$	Immediate cost of using linearization matchingmethod
$V^{π} (x_{0})$	Index of the evaluation strategy
$c (x_{t})$	Cost function

Author Contributions

Yufeng Ding. Wuhan University of Technology School of Mechanical and Electrical Engineering. Associate professor. Yufeng Ding has put forward the impedance control method of surface polishing robot and theoretical framework. JunChao Zhao. Wuhan University of Technology School of Mechanical and Electrical Engineering. Master. JunChao Zhao has built simulation model of surface polishing robot impedance control and carried out simulation experiment. Xinpu Min. Wuhan University of Technology School of Mechanical and Electrical Engineering. Master. Xinpu Min has optimized the impedance control parameters are by using reinforcement learning algorithm and does some related experiments.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study received financial support from National Natural Science Foundation of China (E51875429): Research on the theory and method of product manufacturing quality control in cloud manufacturing environment.

ORCID iD

JunChao Zhao

Code availability

Some models and code generated or used during the study are proprietary or confidential in nature and may only be used with restrictions (e.g. the code can only run in KUKA robot system).

References

Zhu

. Discussion on grinding and polishing application of complex parts based on industrial robot. China Strateg Emerg Ind 2019; 2019(4): 202–203. (in Chinese)

Yousefizadeh

Flores Mendez

JDD

Bak

. Trajectory adaptation for an impedance controlled cooperative robot according to an operator’s force. Autom Constr 2019; 103: 213–220.

Sun

Feng

, et al. Automatic robotic polishing on titanium alloy parts with compliant force/position control. Proc IMechE, Part B: J Engineering Manufacture 2015; 229(7): 1180–1192.

Dong

Ren

, et al. Contact force detection and control for robotic polishing based on joint torque sensors. Int J Adv Manuf Technol 2020; 107(5–6): 2745–2756.

Ding

Min

, et al. Research and application on force control of industrial robot polishing concave curved surfaces. Proc IMechE, Part B: J Engineering Manufacture 2019; 233(6): 1674–1686.

Zhang

Tang

, et al. A polishing robot force control system based on time series data in industrial internet of things. ACM Trans Internet Technol 2021; 21(2): 1–22.

Zheng

Cao

. Robot impedance control method adapting to unknown or changing environment stiffness and damping parameters. China Mech Eng 2014; 25(12): 1581–1585. (in Chinese)

Ferretti

Magnani

Rocco

, et al. Impedance control for industrial robots. IEEE Int Conf Robot Autom 2000; 4: 4027–4032.

Cho

Kim

, et al. A strategy for connector assembly using impedance control for industrial robots. In: 2012 12th international conference on control, automation and systems, Jeju, South Korea, 17–21 October 2012. New York: IEEE.

10.

Woo

Kong

. Frequency-shaped impedance control for safe human–robot interaction in reference tracking application. IEEE/ASME Trans Mechatron 2014; 19(6): 1907–1916.

11.

Liang

Chen

Liao

, et al. A novel impedance control method of rubber unstacking robot dealing with unpredictable and time-variable adhesion force. Robot Comput Integr Manuf 2021; 67: 102038.

12.

Kim

Mishra

Limosani

, et al. Control strategies for cleaning robots in domestic applications: a comprehensive review. Int J Adv Robot Syst 2019; 16(4): 1729881419857432.

13.

Zhu

Guo

Fang

, et al. A path-integral-based reinforcement learning algorithm for path following of an autoassembly mobile robot. IEEE Trans Neural Netw Learn Syst 2020; 31(11): 4487–4499.

14.

Wang

Yan

, et al. Variable admittance control based on fuzzy reinforcement learning for minimally invasive surgery manipulator. Sensors 2017; 17(4): 844.

15.

Zhang

, et al. Learning variable impedance control based on reinforcement learning. J Harbin Eng Univ 2019; 40(02): 304–311. (in Chinese)

16.

Min

. Impedance control strategy of surface polishing robot. Wuhan: Wuhan University of Technology, 2020. (in Chinese)

17.

. Impedance control for position control robot based on force and force torque information. Control Decis 2016; 31(05): 957–960. (in Chinese)

18.

Craig

Brendan

Lech

, et al. Pseudo-rehearsal: achieving deep reinforcement learning without catastrophic forgetting. Neurocomputing 2021; 428: 291–307.

19.

Williams

. Probability, statistics, and random processes for engineers. New Jersey: Cengage Learning, 2012, pp.213–217.

20.

Wang

. Research and implementation of reinforcement learning algorithm based on Gaussian process. Wuhan: Wuhan University of Technology, 2016. (in Chinese)

21.

Wen

. Simulation study on adaptive impedance control of general force feedback equipment. Beijing: Jiaotong University, 2017. (in Chinese)

22.

Xiao

Zhang

, et al. Constant force tracking of robot surface based on adaptive iteration. J Beijing Univ Aeronaut Astronaut 2019; 45(04): 641–649. (in Chinese)

23.

Chen

Fan

Lin

. Study on surface polishing process parameters based on response surface methodology. Digit Manuf Sci 2019; 17(2): 113–118+125. (in Chinese)

Impedance control and parameter optimization of surface polishing robot based on reinforcement learning

Abstract

Keywords

Introduction

Process analysis of surface polishing robot

Modeling of surface polishing robot

Impedance control method of surface polishing robot

Influence of impedance control parameters on control performance

Parameter optimization of impedance control based on reinforcement learning

Learning process of impedance parameters based on reinforcement learning

Improved predictive output distribution function

Simulation experiment and analysis

Impedance parameter adjustment process

Simulation analysis

Experimental study

Conclusion

Footnotes

Appendix

Author Contributions

Declaration of conflicting interests

Funding

ORCID iD

Code availability

References