Sage Journals: Discover world-class research

Abstract

We report on the development of an implementable physics-data hybrid dynamic model for an articulated manipulator to plan and operate in various scenarios. Meanwhile, the physics-based and data-driven based dynamic models are studied in this research to select the best model for planning. The physics-based model is constructed using the Lagrangian method, and the loss terms include inertia loss, viscous loss, and friction loss. As for the data-driven model, three methods are explored, including deep neural network, long short-term memory, and XGBoost. Our modeling results demonstrate that, after comprehensive hyperparameter optimization, the XGBoost architecture outperforms deep neural network and long short-term memory in accurately representing manipulator dynamics. With approximately 500k training data points, the XGBoost model closely matches the performance of the physics-based model, as assessed by the root mean square error between actual and estimated manipulator torque. The hybrid model with physics-based and data-driven terms has the best performance among all models based on the same root mean square error criteria, and it only needs about 24k of training data. In addition, we developed a virtual force sensor of a manipulator using the observed external torque derived from the dynamic model and designed a motion planner through the physics-data hybrid dynamic model. The external torque contributes to forces and torque on the end effector, facilitating interaction with the surroundings, while the internal torque governs manipulator motion dynamics and compensates for internal losses. By estimating external torque via the difference between measured joint torque and internal losses, we implement a sensorless control strategy which demonstrated through a peg-in-hole task. Lastly, a learning-based motion planner based on the hybrid dynamic model assists in planning time-efficient trajectories for the manipulator. This comprehensive approach underscores the efficacy of integrating physics-based and data-driven models for advanced manipulator control and planning in industrial environments.

Keywords

Dynamic model hybrid model data-driven machine learning virtual sensor motion planning

Introduction

Industrial automation has been developed for the past 60 years for various tasks; for example, inspection,¹ machining,^2,3 and pick-and-place.⁴ Along with high speed, high precision, and low cost, machine intelligence is regarded as one of the key performance indices of robots. Recently, during the fourth industrial revolution (i.e., Industry 4.0), the focus has been on large-scale smart factories, which rely on many factors, such as the Internet of things, cyber-physics systems, big data, and artificial intelligence.⁵ All of these require a dynamic model of the machine as a crucial and basic building block. Without an appropriate model, the behaviors of the machine cannot be predicted, planned, and controlled effectively.

The dynamic model of the machine includes various motion effects such as inertia forces, Coriolis forces, gravitational force, and some losses. Traditionally, the model was developed based on the physics principle and phenomenon of the system, using either the Newton–Euler method or the Lagrangian method. However, the model of a complex system, such as a manipulator, is incapable of covering all the physical behaviors of the system, especially nonlinear behaviors, subtle un-modeled dynamics, and manipulator specification variations due to manufacturing processes. For example, the joint of the manipulator may have loss terms other than (Coulomb) friction and (viscous) damping, but in ordinary modeling work, we usually only consider these two because they have the most significant effects. The presence of manipulator specification variations due to manufacturing processes indicates that the commercial manipulator's specifications are not the same as those listed in the datasheet, such as the D-H table. In this case, a data-driven model is quite useful due to its experience-based learning of the phenomenon and the behavior of the manipulator.

In this work, following our initial exploration of developing a physical-based model of manipulators,⁶ we extend our exploration to reliable data-driven and hybrid models (i.e., to cover un-modeled loss terms) that were derived using machine learning,^7,8 where three methods were explored: deep neural network (DNN),⁹ long short-term memory (LSTM),¹⁰ and decision tree.¹¹ For example, Gillespie et al.¹² and Boucetta and Abdelkrim¹³ used DNN to deal with the uncertainty of the dynamic model for flexible and soft robots. In contrast, LSTM, a type of recurrent neural network, is commonly used for situations with time series. For example, Lai et al.¹⁴ described a way to solve the problem of adaptive fuzzy inverse compensation control for uncertain nonlinear systems with generalized dead-zone nonlinearities in uncertain actuators. Thus, the generalized dead zone of the motor has hysteresis, where the values of friction in forward and reverse rotation are different. Without proper suppression, the nonlinear dead-zone may cause more errors and even instability of the system. Therefore, some researchers have tested whether LSTM can make the fitted model more accurate.^10,15 Finally, the decision tree approach uses a tree model to determine the consequences of events. The most famous method is XGBoost.¹⁶ It has been widely recognized in many machine learning and data mining challenges, such as Kaggle and the KDD Cup,¹⁷ which shows the potential to handle un-modeling dynamics of robotic systems.

In addition to the dynamic model, a smart manipulator requires the use of many sensors.^18,19 Force/torque sensors are a popular choice and have led to many applications that require contact interactions, such as assembly tasks and human–robot collaborations.^20,21 For example, peg-in-hole is a common assembly task in factories. Various works have addressed this issue. Lin²² developed a plug-in system for large transformers using a six-axis robotic arm. Kim et al.²³ developed a hole-detection algorithm for square peg-in-hole applications using force-based shape recognition. Tang et al.²⁴ developed an autonomous alignment strategy for peg-in-hole. Qin et al.²⁵ utilized a multi-sensor perception strategy to enhance the autonomy of uncertain peg-in-hole tasks. Beltran-Hernandez et al.²⁶ developed a variable compliance-control strategy for robotic peg-in-hole assembly. These three works utilized force/torque sensors. Due to the high cost of multi-axis force/torque sensors, other strategies for performing the peg-in-hole task have been proposed. For example, Park et al.²⁷ developed a low-cost peg-in-hole assembly strategy using contact compliance and without using a physical force/torque sensor. At present, most research on virtual force sensors focus on the safety of human–machine collaboration (such as Yen et al.²⁸ and Li et al.²⁹). Similar applications often require the establishment of mathematical models and the establishment of compensation for friction to achieve more accurate accuracy. For example, Lai et al.¹⁴ developed a method to compensate the dead zone of the general actuator through the fuzzy adaptive inverse compensation method. This research hopes to establish all models at once through machine learning. Considering the high demand of the force/torque sensing information on the end-effector of a manipulator, virtual force/torque transducers seem to be a worthwhile research topic^28,30 and motivated us to initially explore this aspect in this article. One of the popular methods to develop a virtual force/torque sensor of a manipulator is to estimate the force/torque values by subtracting internal torques from measured torques. Estimating the internal torque of the manipulator requires a high-precision dynamic model, so this extension smoothly follows the development of dynamic models reported in this work.

Except for dexterous manipulation, which required force/torque estimations or measurements, trajectory planning is another important issue. Traditionally, the trajectory planning of industrial robots relies on physics models.³¹ However, with the advancements in machine learning, the typical trajectory optimization can be achieved by AI algorithms. For example, Tian and Collins³² and Števo et al.³³ demonstrated the application of genetic algorithms to optimize trajectories for robot arms efficiently. Furthermore, reinforcement learning (RL) algorithms³⁴ have emerged as another popular method for trajectory optimization. Schulman et al.³⁵ successfully implemented RL techniques to generate optimal motions for a robot manipulator, particularly in pick-and-place tasks. The RL algorithms aim to learn optimal behaviors in an environment given user defined policy.³⁶ Thus, the integration of a reliable hybrid dynamic model of a robot arm and RL algorithms forms a trajectory planner, promising enhanced efficiency and adaptability for industrial robots in various tasks.

In short, motivated by the emerging development of data-driven approaches, we reported on developing a reliable physics-data hybrid dynamic model for an industrial robot. The performance investigation of the physics-based and data-driven dynamic models, and the possible hybrid physics-data dynamic models is conducted. While the reported research usually focuses on using a specific modeling method for a specific application, we are interested in finding a reliable and general performance trend of the dynamic model from the aspect of its composition. In addition, following the core development of the data-driven dynamic model of the manipulator, we reported our initial investigation of the manipulation applications and a motion planner using the hybrid dynamic model. The contributions of this work include the following:

Developing an implementable physics-data hybrid dynamic model of the articulated manipulator among various data-driven dynamic models using machine learning techniques, such as DNN, LSTM, and XGBoost and evaluating their performance.

Proposing an external torque observer based on the developed data-driven models and then proposing a virtual force/torque sensor via the observed external torque for robotic manipulation.

Proposing a sensorless peg-in-hole assembly strategy inspired by human operation.

Developing a learning-based time-efficient motion planner based on the physics-data hybrid dynamic model and is validated by experiments.

To position our contributions among existing literature, we compare our approach with others. Reinhart et al.³⁷ studied data-driven forward and inverse dynamic models of an industrial robot for pure feedforward control. Carron et al.³⁸ applied a physics-data hybrid dynamic model of a robotic system for tracking controller design. By contrast, we attempt to find an accurate digital twin of the robot arm for sensorless dexterous manipulation and high-performance motion planning. Yu et al.³⁹ developed a data-driven dynamic model for a fish robot to handle fluid dynamic issues; however, this research aims to compensate for the un-modeled dynamics of an articular manipulator. Xu et al.⁴⁰ built a data-driven dynamic model for a cable-driven planar robot to predict its tension and collision conditions. Nevertheless, our data-driven model is not only used to plan motions for a robot arm but also applied to realize a torque observer for sensorless manipulation and learning-based motion planning. Lastly, our previous work⁶ proposed a learning-based motion planner, but we implemented a more accurate digital twin with the hybrid dynamic model and adjusted the policy for RL to generalize the previous framework. Overall, a functional physics-data hybrid dynamic model is constructed for sensorless dexterous manipulation and learning-based motion planning of an industrial robot in various industrial applications.

The remainder of this article is organized as follows. “Dynamic models” section introduces dynamic models, and “Performance of the dynamic models” section describes the performance of the dynamic models. “Design of a virtual force sensor” section describes the design of a virtual force sensor. “Design of a motion planner” section displays the design of the motion planner. “Experiment” section reports the experimental results and “Conclusion and future work” concludes this research.

Dynamic models

The dynamic models utilized in this work were developed using two different approaches; one is a physics-based dynamic model, and the other is a data-driven model. The former is intrinsic, which allows us to truly understand the dynamics of the manipulator, but un-modeled dynamics, such as nonlinear terms, are difficult to identify. Furthermore, if the model has time-dependent terms (i.e., after a long operation time, the friction and damping terms are usually different), a remodel of the system is necessary. In contrast, the data-driven model is better able to capture nonlinear terms, but the trade-off is that the data collection and learning processes may take considerable time. While we are aiming at using a complete data-driven model for the tasks in this work, we are also interested in exploring the performance and dynamic characteristics of the physics-based model and the physics-data hybrid model for comparison purposes. Thus, this section describes the construction of the physics-based model and data-driven model separately.

The physics-based models

The physics-based dynamic model of the system was derived using the Lagrangian method. By forming the Lagrangian ( $L = T - V$ ) of the manipulator, which is composed of kinetic energy ( $T$ ) and potential energy ( $V$ ), the equation of motion (EOM) of the system can be derived according to the Lagrangian equation.

\frac{d}{d t} (\frac{\partial L}{\partial {\dot{q}}_{i}}) - \frac{\partial L}{\partial q_{i}} = Q_{i}^{'}, i = 1, \dots, n

(1)

where

q_{i}

and

Q_{i}^{^{'}}

are the i-th generalized coordinate and the i-th generalized non-conservative force, respectively. The symbol n represents the number of independent degrees of freedom (DOF) of the system. After derivation and rearrangement of the terms, the EOMs of the system can generally be expressed as

M (q) \ddot{q} + C (q, \dot{q}) \dot{q} + G (q) = Q^{'}

(2)

where

M (\cdot)

C (\cdot)

, and

G (\cdot)

represent the inertia term, Coriolis and centrifugal term, and gravitational term, respectively.

In this work, the derivation of EOMs was carried out for a 6-DOF articulated and collaborative manipulator (TM5-700, Techman Robot Inc.). The CAD model and configuration of the manipulator are shown in Figure 1. This robot is utilized as the experimental testbed for the proposed methodology. Using the joint angles of the manipulator ( $θ$ ) as the generalized coordinate, the general form of the physics-based EOM of the manipulator can be expressed as

E O M_{p h y s i c s} (θ, \dot{θ}, \ddot{θ}) = M (θ) \ddot{θ} + C (θ, \dot{θ}) \dot{θ} + G (θ) = Q^{'}

(3)

Figure 1.

The manipulator TM5-700: (a) the CAD model, (b) the configuration, and (c) the simplified dynamic model.

For the ideal manipulator without loss, the non-conservative force $Q^{'}$ only contains joint actuation torque ( $τ_{m o t o r}$ ):

E O M_{p h y s i c s} (θ, \dot{θ}, \ddot{θ}) = M (θ) \ddot{θ} + C (θ, \dot{θ}) \dot{θ} + G (θ) = Q^{'} = τ_{m o t o r}

(4)

The joints are controlled to move according to the defined motion, either position-controlled (i.e., angle), velocity-controlled (i.e., angular speed), or force-controlled (i.e., torque). This ideal physics-based model of the manipulator shown in (3)–(4) is hereafter referred to as Model P1.

In contrast, the empirical manipulator generally has loss, so the non-conservative force $Q^{'}$ in general contains two parts:

E O M_{p h y s i c s} (θ, \dot{θ}, \ddot{θ}) = M (θ) \ddot{θ} + C (θ, \dot{θ}) \dot{θ} + G (θ) = Q^{'} = τ_{m o t o r} - τ_{l o s s}

(5)

This realistic physics-based model of the manipulator is hereafter referred to as Model P2. The

τ_{l o s s}

is assumed to be generated at the joints, including inertia loss (

B_{m} \ddot{θ}

), viscous loss (

C_{m} \dot{θ}

), and friction loss (

f_{c} (s i g n (\dot{θ}))

τ_{l o s s} (θ, \dot{θ}, \ddot{θ}) = B_{m} \ddot{θ} + C_{m} \dot{θ} + f_{c} (s i g n (\dot{θ}))

(6)

where

f_{c}

and

s i g n (\cdot)

represent the magnitude and direction of the friction force, respectively. No loss is assumed in other parts of the manipulator. Note that equations (4) and (5) represent different scenarios we have tested. Equation (4) indicates that the only non-conservative force of the manipulator model (

Q^{'}

) is motor torque (

τ_{m o t o r}

) (i.e., Model P1), and equation (5) indicates that the manipulator model also includes the loss terms (

τ_{m o t o r} - τ_{l o s s}

) (Model P2). Because the models contain different terms, their performance is expected to be different. In addition, note that the terms of

\ddot{θ}

in (3) and (6) are different. The

M (θ) \ddot{θ}

in (3) represents inertia forces, and the

B_{m} \ddot{θ}

represents loss proportional to the acceleration

\ddot{θ}

The data-driven models

In addition to constructing the model using physics principles, the model can also be data-driven. Following a similar form as the EOM shown in (2), the data-driven EOM can be expressed as

E O M_{d a t a} (θ, \dot{θ}, \ddot{θ}) = τ_{m o t o r}

(7)

where the right side of the equation is the active joint torque (

τ_{m o t o r}

), which is utilized to support various effects of the manipulator's dynamic motion, including inertia, Coriolis and centrifugal forces, gravitational forces, and losses. More specifically, the left side of (7) contains the effects

f_{1} (θ, \dot{θ}, \ddot{θ})

shown in (3) and

f_{2} (θ, \dot{θ}, \ddot{θ})

shown in (6). Because the exact form of the EOM is unknown, the abstract function

f (\cdot)

is utilized to express the resultant effects. In this formulation, the nonlinear effects can be included, which are difficult to cover explicitly in the physics-based formulation. Note that the manipulator may also contain some dynamic effects outside the input states

(θ, \dot{θ}, \ddot{θ})

, such as jerk dynamics, which need

\overset{⃛}{θ}

. These effects are ignored in the modeling work.

Equation (7) is utilized as the data-driven EOM, where the manipulator states $(θ, \dot{θ}, \ddot{θ})$ in the left side and the torque $τ_{m o t o r}$ in the right side are utilized as the input (i.e., feature) and output states (i.e., label) for machine learning, respectively. From the aspect of physics, when the manipulator moves at the states of $(θ, \dot{θ}, \ddot{θ})$ , the joints of the manipulator at this specific instant should supply the joint torque $τ_{m o t o r}$ if the model is precise. Therefore, it would be easy to evaluate the performance of the model by comparing the difference between its estimated $τ_{m o t o r_e s t i m a t e d}$ and actual manipulator torque $τ_{m o t o r}$ . The dataset used for model training was experimentally generated using TM5-700, which will be detailed in “Performance of the dynamic models” section.

The complete dynamic model contains 6 DOFs, and the training process is divided into subgroups to reduce the model's complexity and the required dataset size. The 6-DOF articulated manipulator is generally designed to use the first three axes for translational motion (i.e., joints 1–3) and the last three axes for rotational motion (i.e., joints 4–6). The last three axes are compactly designed to have their rotational axis orthogonally intersected with each other, so Pieper's solution can be deployed to algebraically or geometrically solve the inverse kinematics problem of the manipulator. Because of orthogonality, joints 4 and 5 were trained individually, yet the position states of other joints were imported so that the gravitational effects could be correctly considered. In contrast, the first three axes have a broad motion range, and their motion dynamics are highly coupled, especially the second and third axes. Therefore, the first three axes are trained together. Furthermore, because the motion of the last three axes is comparably small compared to that of the first three axes, in the training process of the first three axes, the motion of the last three axes was ignored and could be regarded as a small point mass; m₄ is the sum of the last three axes’ mass, mounted at the end of manipulator as shown in Figure 1(c). The machine learning models of axes 1, 2, and 3 are established separately from the models of axes 4 and 5. We had tried to train the model by including all axes of the manipulator as inputs. However, the model's performance was not promising, even when the data number was doubled. Because the first three axes are for translational motion of the manipulator, the effects of the axes are definitely coupled and should be trained together. Axes 4 and 5 are designed for rotational motion within a small range, so their effect can be separated from the first three axes to include training efficiency.

As described in the introduction, three methods (DNN, LSTM, and XGboost) were explored to evaluate their performance in constructing a data-driven dynamic model of the manipulator, whose structure diagrams are shown in Figure 2. The neuron of LSTM is different from DNN. LSTM layers whose cell is memorable and composed of an input gate, a cell state, forget and output gates.¹⁵ XGBoost (Extreme Gradient Boosting) is a gradient-boosted decision tree. Each time the original model is kept unchanged, and a new function is added to the model to correct the error of the previous tree to improve the overall model. The structure shown in the far right side of Figure 2 is a set of classification and regression trees. Each leaf of the regression tree corresponds to a set of values, which are used as the basis for subsequent classification. The rightmost circles in the figure may be the output of the XGBoost model, which depends on the result of the final judgment of the input value in the decision tree to determine which rightmost circle is the final output.¹⁶ Note that the utilized DNN package (keras) and XGBoost package (xgboost) have different architecture,^41,42 where the former supports multiple outputs, but the latter does not. Therefore, the three estimations ( $τ_{1}, τ_{2}, τ_{3}$ ) of the model using XGBoost needed to be trained separately. Table 1 lists the features and labels of the training process. The training of joint 6 was skipped since it was not utilized in the following applications.

Figure 2.

Structural diagrams of (a) DNN, (b) LSTM, and (c) XGBoost. The leftmost and rightmost circles represent the input and output of the model, respectively. In the case of XGBoost, shown in (c), only one of the outputs is the actual output, depending on the result of the final judgment of the input value in the decision tree.

Table 1.

The features and labels of the manipulator used for training.

The translational axis (joint 1–3)
Feature $(r a d, \frac{r a d}{s}, \frac{r a d}{s^{2}})$	$θ_{1}, {\dot{θ}}_{1}, {\ddot{θ}}_{1}, θ_{2}, {\dot{θ}}_{2}, {\ddot{θ}}_{2}, θ_{3}, {\dot{θ}}_{3}, {\ddot{θ}}_{3}$
Label $(N - m)$	$τ_{1}, τ_{2}, τ_{3}$
The rotational axis (joints 4–5)
Feature $(r a d, \frac{r a d}{s}, \frac{r a d}{s^{2}})$	$θ_{1}, θ_{2}, θ_{3}, θ_{4}, θ_{5}, θ_{6}, {\dot{θ}}_{j}, {\ddot{θ}}_{j}, j = 4, 5$
Label $(N - m)$	$τ_{j} j = 4, 5$

Unlike the DNN, which uses states of one moment to predict the states of the next moment, the LSTM uses time sequence as input data, and the XGBoost can be modified in the same manner. The structure of LSTM replaces neurons in the original hidden layer of the DNN model with LSTM units; thus, the underlying learning strategy was completely different. The LSTM is prone to overfitting, so the architecture is more complicated than the previous DNN, and dropout and recurrent_dropout must be considered. While the states of a series of timestamps were fed into the model, the input became two-dimensional as shown in Figure 3(a). For any predicted torque at any timestamp [k] shown in the blue block, the model requires the states at timestamp [k-n] to [k]. In our implementation using the LSTM method, $n = 9$ was utilized. The sampling period was $8 ms$ during the experiment, so the model used the data of the past $80 ms$ , shown in yellow blocks, as information for prediction. In the case of the first three axes, the number of states input into the model was 90 (i.e., 9 states multiplied by 10 timestamps). The model using XGBoost was also trained with time sequence data. Because XGBoost could not accept array inputs, the data needed to be flattened into a one-dimensional array as shown in Figure 3(b). For these models with sequential data, the abstracted form of the data-driven model shown in (7) is still applicable, but the required inputs $(θ, \dot{θ}, \ddot{θ})$ contain states at several different timestamps.

Figure 3.

The inputs/features and outputs/labels of the model using the LSTM method (a) and XGBoost method (b).

After describing the model construction, as well as its features and labels, the following paragraphs in this section will describe the selection of hyperparameters, which need to be set before training. If we use LSTM as an example, the hyperparameters include several factors, such as the learning rate, number of iterations, number of layers, and number of neurons in each layer. The choice of hyperparameters seriously affects the capability and performance of the model, but the selection of high-quality hyperparameters is very complicated. Thus, a methodical search strategy is crucial.^43–45 Four methods are commonly used to adjust hyperparameters: manual, grid search, random search, and Bayesian optimization. The manual method mainly relies on a variety of different architectures to test and train individually and finally adjust the hyperparameters according to the result of the loss function or objective function. The grid search method sequentially defines the values to be searched according to the type of hyperparameter. The program runs through all the hyperparameter combinations to test and select the best parameters based on the loss function or objective function, which is time-consuming because every Cartesian product of a hyperparameter combination must be tried. Although this brute-force method is time-consuming, it can find the best solution as long as the exhaustive hyperparameters are sufficiently comprehensive. The random search method is similar to the grid search method. The difference is that it randomly selects a combination of parameters to search. Relatively speaking, it may be more economical in time and computing resources than the grid search method, but the search may be insufficient. Finally, the essence of Bayesian optimization lies in the verification of previous results and probability to select hyperparameters for the next iteration.⁴⁶

This work used two methods to adjust the hyperparameters of machine learning. One was grid search due to its complete search in the parameter space, and the other was Bayesian optimization due to its efficiency. The work then compared the performance of the manipulator models using these two adjusting methods. GridSearchCV in the scikit-learn module was utilized.⁴⁷ For a DNN model with multiple hidden layers and a large amount of training data, a grid search is time-consuming and impossible to run simultaneously. For example, a model with 10 hyperparameters, each with nine variations, would have to run $9^{10}$ trials to find the best hyperparameter. Even comparing the search results would be time-consuming. Therefore, an assisted lookup table was used to instantly update the best hyperparameter values during the search process, as shown in Algorithm 1. Thus, the grid search could independently adjust the respective hyperparameters individually, so only 9 $\times$ 10 attempts were required. Since changes in the neurons and activation functions of the previously hidden layer affect the next hidden layer, it was necessary to repeatedly adjust the parameters to confirm their consistency. Therefore, assuming that the process needs to be repeated three times, the attempts become 9 $\times$ 10 $\times$ 3. In this work, the 492,038 data points (80% for the training set and 20% for the validation set) were utilized, and the model was set to have a seven-layer hidden layer, each with a set of neuron numbers and activation functions. Therefore, there were 14 hyperparameters to adjust in total, and it took about 2–3 days to execute using an ordinary desktop computer equipped with an Intel i7-6800k CPU, NVIDIA RTX 2080 GPU, and 16 GB of RAM. Note that the comprehensiveness of the hyperparameter space determines the quality of the searched hyperparameter.

Algorithm 1.

Automatic independent hyperparameter grid search

Take NN with two hidden layers as an example (with Python scikit-learn GridSearchCV).

Assuming that you only want to adjust part of the hidden layer, the initial value is defined as follows.

Initial parameter definition:

Record hyperparameter:

i. HyperParaList = [‘neu1’,'acti1’, ‘neu2’,'acti2’,'optimizer’,'lr’].

ii. HyperParaValue = [64,'linear’, 64,'linear’,'Adam’,0.001].

Hyperparameter value lookup table:

i. neuron = [64, 128,256,512,1024,2048].

ii. activation = [‘softplus’,'softsign’,'relu’,'tanh’,'selu’,'linear’,'elu’].

iii. optimizer = [‘SGD’,'Adagrad’,'Adadelta’,'Adam’,'Adamax’,'Nadam’].

iv. lr = [0.0005, 0.001, 0.0015, 0.002, 0.0025, 0.003].

Output:

Every time a single hyperparameter adjustment is completed, the best value recorded by HyperParaValue is printed and saved to a csv file:

Ex.: HyperParaValue:[128, ‘tanh’, 128, ‘tanh’, ‘Adam’, 0.0005].

Procedure:

Step 1. Read the first value of list HyperParaList and HyperParaValue as the GridSearchCV to adjust the hyperparameters.

The first value of HyperParaList is ‘neu1’ and its type is neuron, so GridSearchCV will use neuron as the search table. Other hyperparameter values are defined according to HyperParaValue.

Step 2. After the GridSearchCV search is completed, the best value will overwrite the position in HyperParaValue corresponding to the HyperParaList search.

Step 3. Repeat steps 1 and 2 until the end of the list to complete a complete NN hyperparameter search.

Step 4. Then, you can search again from the beginning based on the results of this complete search to ensure that the hyperparameters are stable.

The Bayesian optimization search method, in contrast, establishes a probability model through the corresponding relationship between the loss function results obtained by the model with the previously generated hyperparameters. Thus, each time the hyperparameters were adjusted, a probability model was attempted to estimate the minimum loss function, where the adjustment of the hyperparameters was executed more efficiently.^41,44 The objective function of the Bayesian optimization used in this work was L2 loss or mean square error (MSE):

M S E = \frac{1}{m} \sum_{k = 1}^{m} (y_{k} - \hat{y_{k}})^{2}

(8)

As the value of the objective function decreases, the better the solution.

The physics-data hybrid models

While either the physics-based model or the data-driven model can be separately utilized to model the manipulator's dynamic behaviors, these two models can also be combined, forming the so-called physics-data hybrid model. In this method, the data-driven part acts as the compensation term to cover the un-modeled dynamics ignored in the physics-based part. This is advantageous because inclusion of the data-driven part increases the model accuracy, yet the required data size is not as large as the pure data-driven model which requires a large data set to capture the complex dynamic behavior of the manipulator.

Various compositions could be utilized to form the hybrid models. The first trial in this work involved taking all the physics-based terms from (3) and (6) into account and using a data-driven model ( $f_{h 1} (θ, \dot{θ}, \ddot{θ})$ ) to compensate for un-modeled effects; this model is hereafter referred to as Model H1:

E O M_{p h y s i c s} (θ, \dot{θ}, \ddot{θ}) + τ_{l o s s} (θ, \dot{θ}, \ddot{θ}) + f_{h 1} (θ, \dot{θ}, \ddot{θ}) = τ_{m o t o r}

(9)

The second trial considered motion dynamics without loss modeling and used a data-driven model (

f_{h 2} (θ, \dot{θ}, \ddot{θ})

) to model the loss and compensate for un-modeled effects; this model is hereafter referred to as Model H2:

E O M_{p h y s i c s} (θ, \dot{θ}, \ddot{θ}) + f_{h 2} (θ, \dot{θ}, \ddot{θ}) = τ_{m o t o r}

(10)

In contrast, the third trial kept the loss part and used a data-driven model (

f_{h 3} (θ, \dot{θ}, \ddot{θ})

) to model motion dynamics and other model effects; this model is hereafter referred to as Model H3:

τ_{l o s s} (θ, \dot{θ}, \ddot{θ}) + f_{h 3} (θ, \dot{θ}, \ddot{θ}) = τ_{m o t o r}

(11)

The models constructed in this section are summarized in Table 2. The data-driven-only model is referred to as Model D1, which utilizes DNN-based architecture.

Table 2.

Composition of various models with physics-based and/or data-driven portions.

Model	$f_{1} (θ, \dot{θ}, \ddot{θ}) = M (θ) \ddot{θ} + C (θ, \dot{θ}) \dot{θ} + G (θ)$	$f_{2} (θ, \dot{θ}, \ddot{θ}) = B_{m} \ddot{θ} + C_{m} \dot{θ} + f_{c} (s i g n (\dot{θ}))$	Data-driven model
Model P1	V
Model P2	V	V
Model D1			V
Model H1	V	V	V
Model H2	V		V
Model H3		V	V

Motion data generation of the manipulator

Data collection is one of the key elements of the machine learning process, and its quality strongly determines the performance of the learned model. This is especially true for complex dynamic systems, which have many states coupled with each other. The manipulator in this work is a typical complex dynamic system, whose complete global training data is difficult to collect.⁴⁸ Coupled states are especially difficult to collect because they include not only position but also velocity and acceleration data. While data diversity is crucial, efficiently collecting a sufficient amount of data with broad diversity is very important. In this section, the trajectory generation method, which quickly generates a diversified training data, is described.

Motion data generation of the manipulator includes three steps. First, the motion range of each joint of the manipulator is digitized into a selected angular interval, so the original infinite possible joint angles are transformed into finite angular configurations. For example, if all six joints of the manipulator are digitized into N segments, there exists $(N + 1)^{6}$ manipulator configurations. Second, removing the configuration causes collision. Third, trajectories of the manipulator are generated by permutation of the possible manipulator configurations. For example, if the manipulator has M configurations, there exists $P_{2}^{M}$ possible trajectories. The trajectories are organized into a list, where the end point of one trajectory matches the starting point of the next trajectory. Thus, $P_{2}^{M}$ trajectories are arranged into one continuously movable trajectory. For each trajectory, the manipulator is set to move at V different speeds. From the original manipulator configurations, there are $P_{2}^{M} V$ generated trajectories. Thus, the motions with various dynamics will be included in the data. Also, the data with $θ$ , $\dot{θ}$ =0 but $τ > 0$ are included since it is the stage when motors in robot joints are trying to conquer the frictional force. With these considerations, the data will enhance the accuracy of the physic-data hybrid model.

The empirical implementation of trajectory generation on TM5-700 is described as follows. First, the rotation ranges of the first to the sixth joints are $\pm$ 270 $\circ$ , $\pm$ 80 $\circ$ , $\pm$ 150 $\circ$ , $\pm$ 180 $\circ$ , $\pm$ 180 $\circ$ , and $\pm$ 270 $\circ$ , respectively. In practical applications, different joints use a variety of angle values, depending on the motion range of the manipulator. The values also include both ends of the motion range, so the whole rotation range is covered. Increasing the number of joint angles of all axes will definitely increase the coverage of the manipulator workspace. Note that both the position and the speed of the joint should be varied. Therefore, the position and speed combination will increase dramatically if more joint angles are set. We consider this work to be our first trial, so the number of joint angles is set at values feasible for training and experiments. Second, the manipulator is modeled using rectangular orientation bounding boxes⁴⁹ as shown in Figure 4, and then the model is utilized for a collision check using the separation axis theorem.⁵⁰ The surrounding objects residing within the work space of the manipulator are also modeled and checked for collision to ensure that the manipulator can be driven safely. After the collision check, four configurations are left, so there are six trajectories after permutation. Each trajectory is set to run at three different speeds, so 18 trajectories in total are generated for data collection.

Figure 4.

The combination of simplified bounding-box model of the manipulator and hybrid dynamic model of the TM5-700 as a simulator.

Performance of the dynamic models

Performance comparison of physics-based, data-driven, and hybrid dynamic models

The models listed in Table 2 were trained using the DNN-based data-driven model with empirical motion data of a TM5-900 manipulator. The features and label details are listed in Table 1. The DNN architecture used a hidden layer with seven layers. Each hidden layer used batch normalization, which speed up training completion, reducing the need for dropouts, and increased the likelihood of early stopping. The number of neurons in each layer was 512, 1024, 1024, 1024, 1024, 1024, 512. The activation function was tanh, the last layer was the output layer, the activation function was linear, the loss function was the average absolute error, the optimizer was adam, and the validation_split = 0.2, which indicated that 80% and 20% of the data were training and verification data, respectively.

The evaluation was executed using the motion of the first three axes of the manipulator (i.e., joints 1–3), which were highly coupled and dynamic. A total of 23,598 training data points were collected and shuffled before feeding them to the models. In addition to the training and verification data, test data were generated using six randomly selected manipulator motion trajectories, where each ran five times. Thus, 90 joint trajectories in total were used for testing. For each trajectory, the root mean square error (RMSE) between the true torque generated by the manipulator ( $τ_{m o t o r r e a l}$ ) and the torque estimated by the models ( $τ_{m o t o r}$ ) was computed:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(τ_{m o t o r, i} - τ_{m o t o r r e a l, i})}^{2}}

(12)

where N indicates the number of data points. Thus, for each model, there were 90 RMSE values. To evaluate the model statistically, the means, standard deviations, maximums, and minimums of the models were computed as the performance index of the model.

Table 3 lists the performance of the various models constructed in “Dynamic models” section. The results reveal that the physics-based model without loss (i.e., P1) is not realistic and has large RMSEs. The added physics-based loss term greatly improves the model's performance (i.e., P2). Furthermore, the added data-driven part further improves the model's performance (i.e., H1), which indicates that the DNN captures the un-modeled dynamics of the manipulator. The hybrid models H2 and H3 separately used the data-driven model to compensate for the physics-based loss terms or the motion dynamics of the model, respectively. However, given the same dataset for training, the added unknown model dynamics decreases the performance of the models. H3 performing worse than H2 indicates that motion dynamics are more complex and difficult to learn than loss dynamics.

Table 3.

Statistical performance of various models based on RMSEs between the torque generated by the manipulator and estimated by the models (unit:N·m).

Model	Mean	SD	Max	Min
Model P1	9.554	3.297	18.171	3.754
Model P2	3.990	0.955	6.316	2.256
Model H1	3.656	1.097	8.172	2.314
Model H2	5.440	1.332	7.501	2.393
Model H3	7.026	4.298	20.033	2.522
Model D1	8.911	4.799	24.257	3.022

The shading results are either have better result or have important discovery.

Architecture and hyperparameter adjustment of the data-driven models

Table 3 also reveals that with the same dataset, the pure data-driven model D1, which needs to learn all the dynamics of the manipulator, has the worst performance among all models with different compositions, which are listed in Table 2. This result suggests that the amount of training data may not be inadequate, or the chosen hyperparameters are unsuitable for learning manipulator dynamics. Therefore, additional evaluations were executed to address these concerns. Table 4 shows the performance of the data-driven model with different training data and the same DNN architecture. The first two rows are identical to the data shown in Table 3 as the reference. The table clearly shows that the means of RMSE decrease when the amount of training data increases. The increase in training data effectively improves the performance of the data-driven model, but it is still not as good as the physics-based model P2.

Table 4.

Statistical performance of the DNN-based models with different amounts of training data (unit:N·m).

Model	Training data	Mean	SD	Max	Min
Model P2		3.990	0.955	6.316	2.256
Model D1	23,598	8.911	4.799	24.257	3.022
	64,840	8.330	3.416	16.068	3.063
	95,717	7.141	2.492	12.100	3.547
	331,481	6.686	4.045	19.108	2.459
	492,038	6.093	2.281	10.764	2.483

The shading results are either have better result or have important discovery.

Following the results shown in Table 4, the hyperparameters of the DNN-based model were adjusted using a grid search and Bayesian optimization. Meanwhile, the variations of the model with different loss functions and layers were evaluated. The DNN-based models with hyperparameter adjustment are hereafter referred to as Model D2. In addition to the 7-hidden-layer architecture and L2 loss utilized in the model, the 3-hidden-layer architecture and L1 loss were evaluated, where the L1 loss represents the mean absolute error (MAE):

M A E = \frac{1}{m} \sum_{k = 1}^{m} | y_{k} - \hat{y_{k}} |

(13)

Table 5 lists the results. A total of 492,038 training data points used were utilized. The first two rows are identical to the data shown in Table 4 as the reference. The last five rows represent the models with different numbers of hidden layers and loss functions, all with hyperparameter tuning by a grid search. Furthermore, Algorithm 1 is utilized to speed up the tuning process. The last row represents the models whose hyperparameters were adjusted using Bayesian optimization.

Table 5.

Statistical performance of the DNN-based models with different architectures and hyperparameters (unit:N·m).

Model		Layers	Hyperparameter adjustment	Loss	Mean	SD	Max	Min
Model P2				RMSE	3.990	0.955	6.316	2.256
Model D1	DNN	7		RMSE	6.093	2.281	10.764	2.483
Model D2	DNN	7	Grid search	MSE	4.702	1.249	8.065	2.446
	DNN	7	Grid search	MAE	4.975	1.339	9.346	2.993
	DNN	3	Grid search	MSE	4.744	1.444	9.58	2.59
	DNN	7	Bayesian optimization	MSE	4.777	1.416	8.341	2.506

The shading results are either have better result or have important discovery.

Table 5 reveals that the mean values of RMSE of the D2 models whose hyperparameters were adjusted using grid search or Bayesian optimization of hyperparameters are similar, and they are all smaller than that of model D1. This indicates that the grid search adjustment of hyperparameters is very helpful in improving learning performance. The table also shows that the model with MSE loss is slightly better than the one with MAE loss. The table also reveals that the model with a 3-hidden layer can achieve a similar performance as the model with a 7-hidden layer. In the current setup, the model using more than three hidden layers does not improve performance and just increases the amount of training and tuning. The adjustment of hyperparameters using Bayesian optimization performs similar to the grid search. However, the parameter search time was reduced from 48 h for the grid search to 4 h, so it is a very time-effective method. In addition to the reported models, we tried many other DNN-based models with different settings, but the performance of the models seemed limited. Therefore, other variations of the model were then explored.

The LSTM-based model takes a sequence of historical data as the input, which intuitively helps to capture the dynamics of the manipulator. The utilized LSTM model is a two-layer LSTM, where Gaussian noise was stuffed between the two parts and the last two fully connected layers. The LSTM model is referred to as model D3. Table 6 shows the results, where the first two rows are identical to the data shown in Table 5 as reference. The table shows that the LSTM model (i.e., model D3) can indeed reduce the overall average RMSE, which confirms that time-serial features can improve machine learning performance. However, although the LSTM model performs better than the DNN model, it is still not as good as the physics-based model. Therefore, data-driven models with different architectures and principles were then explored.

Table 6.

Statistical performance of the models (unit:N·m).

Model name	Type	Mean	SD	Max	Min
Model P2	Physics	3.990	0.955	6.316	2.256
Model D2	DNN	4.702	1.249	8.065	2.446
Model D3	LSTM	4.151	1.673	9.139	2.205
Model D4	XGBoost	3.803	1.444	9.462	2.017

The XGBoost method is developed, and its performance was compared using the DNN and LSTM models. The architecture of the decision-tree models is relatively simple, so the hyperparameter search is much easier than the NN method. There was no need to define each hidden layer, select an order arrangement, or, for example, decide whether to use dropout or batch normalization or which activation function to use. The XGBoost model in this work included the adjustment of the following 10 hyperparameters: learning_rate, max_depth, min_child_weight, colsample_bytree, subsample, reg_alpha, reg_lambda, gamma, n_estimators, and seed. The XGBoost model is hereafter referred to as Model D4. Table 6 shows that although the RMSE standard deviation and RMSE maximum of the XGBoost model are relatively larger than those of the physics-based model, the RMSE means are smaller than those of the latter. This indicates that the pure data-driven model can effectively model the complicate dynamic motion of the manipulator.

In summary, the results reported in this section led to the following conclusions. First, with about 500k of training data, the XGBoost model (i.e., model D4) performs similar to the physics-based model with both motion dynamic terms and loss terms (i.e., Model P2), based on the judgment of RMSE between the actual manipulator torque and the estimated one using the model. Second, the hybrid model with physics-based and data-driven terms (i.e., Model H1) has the best performance among all models based on the same RMSE criteria, and it only needs about 24k of training data. From the aspect of training data collection and training time, this is a very effective method to model the manipulator dynamics when compared to the pure data-driven model. The construction of the physics-based model of the manipulator is not trivial, either. Third, based on the amount of training data used for testing, increasing the amount used improves performance, which matches the general observation of the learning model. Fourth, the adjustment of hyperparameters also improves the model's performance. Bayesian optimization performs similar to the grid search methods but requires much less time. Fifth, given the same dataset, the LSTM with sequential data performs better than the DNN with instant data, and XGBoost outperforms the LSTM model.

Note that many data-driven models were evaluated in this work. The fundamental idea was that we wanted to determine which data-driven model is suitable for modeling ordinary differential equations (ODEs). Once a suitable model is found, the work can be easily extended to other systems that move, as all movable systems obey Newton's Second Law and can be expressed as second-order ODE systems. However, because movable systems appear in many different forms or compositions, only some of them have physics-based models constructed. If a new system is designed, especially a high DOF system, it is also challenging to construct its physics-based model. Therefore, we think that the data-driven approach has merit and is worth trying.

Design of a virtual force sensor

The states of the end effector and its force interaction with the environment are important information for manipulator operation. While the states of the end effector can easily be observed using the manipulator joint configuration with forward kinematics, the force interaction is difficult to obtain, as it includes three forces and three sources of torque. One general approach is to install a six-axis force sensor on the end effector. However, this greatly increases the hardware cost of the manipulator.²⁸ Following the modeling work presented in previous sections, the estimated joint torque of the manipulator using machine learning is further developed into estimation of interaction forces between the end effector and the environment. This includes two steps: estimating the required torque at the joints, which are utilized to generate the required forces at the end effector and establishing the mapping between the former and the latter.

The external torque observer

As described in (5), if the manipulator moves freely within the workspace with states $(θ, \dot{θ}, \ddot{θ})$ , the joints are required to generate $τ_{m o t o r}$ to support this motion. Here, $τ_{m o t o r}$ is rewritten as $τ_{m o t o r_f r e e}$ to make the presentation clearer:

E O M_{p h y s i c s} (θ, \dot{θ}, \ddot{θ}) = M (θ) \ddot{θ} + C (θ, \dot{θ}) \dot{θ} + G (θ) = Q^{'} = τ_{m o t o r_f r e e} - τ_{l o s s}

(14)

When the manipulator has some payload or has a force interaction with the environment, the torque generated at the joints needs to be adjusted:

τ_{m o t o r} = τ_{m o t o r_f r e e} + τ_{e x t}

(15)

More specifically, the partial torque of the total joint torque (

τ_{m o t o r})

is transmitted to the end effector for force interaction. This torque is referred to as external torque (

τ_{e x t}

). The remaining torque is utilized to generate its own dynamics and to compensate for the loss. Therefore, the external torque can be derived as

τ_{e x t} = τ_{m o t o r} - τ_{m o t o r_f r e e} = τ_{m o t o r} - E O M_{p h y s i c s} (θ, \dot{θ}, \ddot{θ}) - τ_{l o s s}

(16)

Empirically,

τ_{m o t o r}

was directly available from the motor drive board of the commercial manipulator utilized in this work (TM5-700, Techman Robot Inc.), so we just needed to log the torque data when collecting the experimental data for model learning. The

τ_{m o t o r_f r e e}

was estimated using the data-driven model developed in “Dynamic models and Performance of the dynamic models” section.

In the empirical implementation, because the external torque $τ_{e x t e r n a l}$ computed from $τ_{m o t o r}$ and $τ_{m o t o r_f r e e}$ is noisy, the Kalman filter (KF)⁵¹ was utilized:

{\hat{τ}}_{e x t} = K a l m a n [τ_{e x t}]

(17)

The KF system contained only one torque state without introducing other motion states. In the time update of the KF process, the prediction model was set to be the same as the previous value. The measurement update utilized the computed torque

τ_{e x t}

as shown in (17). The noise measurement was obtained by analyzing the model noise and joint-torque noise, and the process noise was adjusted through actual experiments. The overall architecture of the external torque observer is shown in Figure 5.

Figure 5.

The architecture of the external torque observer.

Virtual force sensor

The virtual force sensor uses XGBoost models, which has the best analysis effect above. Given the estimated external torque ${\hat{τ}}_{e x t}$ , the corresponding force/torque on the end effector could be estimated to serve as the virtual force sensor. Mapping from the external joint torque to the forces/torque at the end effector can be computed using the Jacobian in the force domain,³¹ which can also be regarded as a quasi-static estimation of the force flow in the manipulator. Following a similar strategy where the dynamic motion of the manipulator can be approximated using a data-driven model, the Jacobian, which was based on the kinematic relationship, should be able to be modeled using a data-driven model. Therefore, instead of deriving the Jacobian of the manipulator, the virtual force sensor of the manipulator was developed using a data-driven approach.

In this work, a three-axis virtual force sensor was developed using a data-driven model, including a normal force and two tilting moments of the end effector as shown in Figure 6(a). While the manipulator was posed in certain configurations, the forces/torque on the end effector can be roughly approximated using a simpler static force relationship. This work utilized a peg-in-hole application to demonstrate the virtual force sensor, and the most common configuration of the manipulator in this application is shown in Figure 6(b). In this configuration, the estimation of two moments ( $M_{X}, M_{Y}$ ) mainly relied on two external torques of the 4th and 5th joints ( ${\hat{τ}}_{e x t 4}, {\hat{τ}}_{e x t 5}$ ):

M_{X} = f_{M_{X}} (θ_{1}, θ_{2}, θ_{3}, θ_{4}, θ_{5}, θ_{6}, {\hat{τ}}_{e x t 5})

(18)

M_{Y} = f_{M_{Y}} (θ_{1}, θ_{2}, θ_{3}, θ_{4}, θ_{5}, θ_{6}, {\hat{τ}}_{e x t 4})

(19)

Figure 6.

The three-axis virtual force sensor developed in this work: (a) notations of the three estimated axes; (b) the major configuration of the manipulator when the sensor was utilized.

As for the normal force $F_{Z}$ , it was modeled as

F_{Z} = f_{F_{Z}} (θ_{1}, θ_{2}, θ_{3}, θ_{4}, θ_{5}, θ_{6}, {\hat{τ}}_{e x t 1}, {\hat{τ}}_{e x t 2}, {\hat{τ}}_{e x t 3})

(20)

Considering the force flow of the manipulator, the forces on the end effector was propagated (or supported) by all the joints of the manipulator. Therefore, the torque at the first three axes was utilized to estimate

F_{Z}

, and that of the 4th and 5th joints was reserved to estimate

(M_{X}, M_{Y})

. Similar to the model work in “Dynamic models and Performance of the dynamic models” section, these models utilized the XGBoost architecture, and the training data considered the inertia of the robotic arm, so the feature considered the angle, angular velocity, and angular acceleration of 10 steps to predict the joint torque of the manipulator.

To collect labeled data as the training dataset of the models shown in (18)–(20), a commercial six-axis force sensor was installed on the end effector as the ground truth (WEF-6A200-4-RCD, WACOH Inc.), as shown in Figure 7. During the training process, the end effector was posed in various configurations with forces/torque at different levels, and a total of more than 260,000 data points were collected for training. In the experiment, we tried to select five postures to apply force on the force sensor and observed the force sensor values and results predicted by the model (20). The force applied by the hand ranges from 0 to 75 N. The largest difference in force is mainly between 0 and 20 N. The maximum error is about 20 N, as shown in Figure 8, and the average error is about 9.4 N. The maximum force error is 20 N because the empirical system is affected by inertia, which is also supported by the physics-based models. Therefore, the estimation would be imprecise in the very beginning and then improve, as shown in Figure 8. Furthermore, LSTM is a time-series model that uses a period of past data as the inputs, so the model takes time to converge for operation. The figure also shows that the model can capture the dramatic changes of the force, which indirectly indicates that the estimation system has sufficient bandwidth. However, the estimation of the force in a small magnitude is less accurate. We believe this phenomenon results from the manipulator's mechanical properties (e.g., the dead-zone of the joint, static friction, viscous damping forces), as well as the accuracy of the mechatronic systems (e.g., accuracy of the current measurement).

Figure 7.

A commercial six-axis force sensor (WEF-6A200-4-RCD, WACOH Inc.) was installed on the end effector to collect ground truth data for training and quantitative experiment validation.

Figure 8.

Time-series data of the estimated force (dashed curve) and the measured force (solid curves).

Data-driven models for the 4th and 5th joints of the manipulator

After describing the development of various models for the first three axes of the manipulator, this section focuses on the model development of the 4th and the 5th joints of the manipulator. As mentioned previously, due to orthogonality and the intersected joint axis, modeling of the 4th and the 5th joints was executed separately. In addition, because these joints of the manipulator are close to the end effector, instead of generating test data for evaluation, the performance of the joint torque estimation was directly observed using a multi-axis force sensor mounted on the end effector (WEF-6A200-4-RCD, WACOH Inc.), the ground truth. Figure 9 shows the manipulator layout for experimental validation of the 4th and the 5th joint models, respectively. Joint 5 could be posed in two configurations as shown in Figure 9(b) and (c). The joint torque was transformed to be represented as the force of the end effector using the method described in “Virtual force sensor” section, so the measured data could be compared with that of the ground truth data. Note that the joint torque of the manipulator contributes to two tasks; external torque contributes to the forces/torque on the end effector, which interacts with the surroundings, and internal torque is utilized for manipulator motion dynamics and for compensating internal loss. Subtracting the internal torque (i.e., derived by the data-driven model) from the total joint torque (i.e., torque measured at the joints) yields the external torque, which is then transformed to the forces/torque at the end-effector using Jacobian forces. The Jacobian defines the “instant” relationship between the joint torque and the forces/torque at the end-effector, which, when viewed from the perspective of geometry, can be considered a virtual force with geometrical considerations. Figure 10 shows the estimated force from the virtual force sensor and measured force from the six-axis force. The RMSE of the estimated force in joint 4 and 5 experiments were $\pm 2 [N]$ and $\pm 1 [N]$ , respectively.

Figure 9.

Experimental setup for validating the performance of the model for the 4th joint in (a) and the 5th joint in (b)–(c), which has two operating configurations.

Figure 10.

Comparison of the estimated force from the virtual force sensor (x-axis) and measured force (y-axis) from the six-axis force sensor. The dashed line represents ideal conditions.

Table 7 lists the performance of the XGBoost model for the 5th joint. Here, the averaged error, standard deviation of the error, and maximum error between the real force measured by the multi-axis sensor and the force estimated by the model were utilized as the criteria. Four different settings were explored. Trial 1 utilized instant input data (i.e., no sequential data). Trial 2 used the same setup but with data normalization using

(d a t a - m e a n o f d a t a) / s t d . o f d a t a

(21)

Table 7.

Performance of the model for the 5th joint using different model settings (unit:N).

Setting	$E_{m a x}$	$E_{m e a n}$	$E_{s t d}$
1	29.469	6.754	3.844
2	26.701	7.809	4.882
3	20.654	4.32	3.115
4	29.619	13.108	4.774

Trial 3 utilized sequential data inputs, where 10 timestamps were imported, the same number as the LSTM model. Trial 4 used the same input data as Trial 3, and the data were normalized, similar to the process in Trial 2. The table shows that using sequential data helps to improve the performance of the XGBoost model, and data normalization is unnecessary. The general normalization method is to normalize the data to a distribution with a mean of 0 and a variance of 1. This is mainly for the effectiveness of the gradient, but here, we found that the normalization effect will not improve. The main reason may be that the original scaling value is compressed to between −1 and 1; thus, the resolution of the data value becomes smaller, so the effect is not improved.

Design of a motion planner

In “Performance of the dynamic models” section, the accuracy and ability of the physics-data hybrid model to capture the dynamics of the articulated robot have been verified. Except for manipulation, an industrial robot relies on trajectory planning to pick and place objects. The physics-data hybrid model, therefore, is utilized to build a motion planner for the robot arm as Figure 4. The motion planner here focuses on speed optimization of the robot arm and aims to reduce the elapsed time of a chosen trajectory. The experimental results can be observed in “Speed optimization of trajectories” section.

Deep RL was employed for trajectory optimization due to its ability to streamline complex constraint settings and analytics while still yielding comparable results. The specific algorithm utilized in this context was proximal policy optimization (PPO), introduced by OpenAI in 2017.³⁵ PPO operates on an actor-critic architecture, where the actor determines actions, such as adjusting the positions or timestamps of via points. Subsequently, the critic evaluates the actor's actions, assigning a score to guide the actor toward actions that lead to higher scores. In this scenario, the reward is designated for speed optimization, with trajectories receiving higher rewards for shorter elapsed times. The trajectory yielding the highest reward is considered the optimal one.

The motion planner here is to reduce the elapsed time from the start point, via the point, and end point of the trajectory. The reward function of RL is set as:

R e w a r d s = {\begin{matrix} a \times | τ_{l i m i t} - τ_{p e a k} |, i f f o r c e e x c e e d s t h e l i m i t \\ b \times T_{r e d u c t i o n}, o t h e r w i s e \end{matrix}

where

τ_{l i m i t} \in R^{n}

is the limit of torque,

τ_{p e a k} \in R^{n}

is the peak torque of a joint of a robot,

n

is the DOF of the planned robotic system,

T_{r e d u c t i o n}

is the reduced elapsed time of a trajectory via RL optimization, a (<0) is a negative constant, and b (>100) is a positive constant. The optimal ‘a’ is approximately −1 and optimal ‘b’ is around 1000 for this robot arm.⁶

This reward function (22) attempts to penalize when torque exceeds the limit. The reward will be negative as the torque exceeds the limit. The more it exceeds the limit, the more negative rewards it obtains. The $T_{r e d u t i o n}$ is a difference between elapsed time before and after optimization. That is, the PPO algorithm will implement the simulator based on the hybrid model to predict the dynamics of the manipulator and plan a speed-optimized trajectory via the given reward function. The optimized time-efficient trajectory will be generated and transferred to the manipulator for testing as discussed in “Speed optimization of trajectories” section.

Experiment

The developed virtual force sensor described in “Design of a virtual force sensor” was evaluated using two experiments. The first one, wiping a table, involved moving the end effector of the manipulator on a flat surface with constant normal force, $F_{Z}$ , estimated using the virtual force sensor. The second one involved performing the peg-in-hole task of the manipulator, which utilized all three axes of the virtual sensor. To quantitatively evaluate performance, the same commercial force sensor shown in Figure 7 was utilized as the ground truth. The KF was applied to filter estimation noise. Additionally, a force control strategy was deployed to modulate the interaction forces between the manipulator and the objects.

Wiping the table

The robot was set to move on a flat surface according to the trajectory, as shown in Figure 11(a). The same commercial six-axis force sensor was also mounted on the end effector to provide the ground truth force data as shown in Figure 11(b). The robot was set to apply 60 N on the surface, and Figure 12 shows the results. The average error and standard deviation of the errors between the estimated and measured forces are 7.995 and 5.843 N, respectively. Wiping the table was the first set of experiments conducted to evaluate the performance of the virtual force sensor. Unlike the peg-in-hole task, which utilized impedance control, simply digitized position control was utilized in this set of experiments, where the compensated displacement was directly determined by the force error without using impedance.

Figure 11.

The trajectory and setup of the wiping the table experiment.

Figure 12.

In the wiping the table experiment, the manipulator was set to wipe the table with constant 60 N force (dashed line). The dashed-dot and solid curves represent the estimated and measured contact forces between the robot and the table, respectively.

Peg-in-hole

The virtual force sensor was then applied in the peg-in-hole manipulation task, which requires force/torque sensing information. As stated in the literature,^22–24,27 the peg-in-hole task was designed to be utilizable in the following circumstances: (i) no requirements for the models of the peg, the hole, or the environment; (ii) the peg and hole should be nearby initially, but their relative position and orientation are unknown; (iii) the hole does not have a guiding edge or a chamfer for easy assembly.

The peg-in-hole process involves multiple contact scenarios where the force/torque conditions vary. If the force/torque of the virtual sensor $(M_{X}, M_{Y}, F_{Z})$ does not sense contact during the approach process, the peg keeps moving forward as shown in Figure 13(a). The peg contacts the hole; two contact scenarios exist as shown in Figure 13(b) and (c). When a peg is stuck in the hole shown in the small picture on the left of Figure 13(b) and cannot get in, or when one side and bottom are stuck as shown in the small picture on the other side, the above two situations are regarded as “stuck outside the hole.” In this case, the manipulator moves backward for a small distance and adjusts the posture simultaneously, where the peg rotates and moves laterally (i.e., red arrow and blue arrow). If the peg just contacts the hole with one side as shown in Figure 13(c), this scenario is regarded as “stuck inside the hole,” and the peg is strategically rotated and moved laterally to eliminate this situation (i.e., red arrow and blue arrow). The overall control strategy of peg in hole is presented in Figure 14.

Figure 13.

The peg-in-hole process: (a) approach the hole and two contact scenarios, (b) stuck outside the hole and (c) stuck inside the hole.

Figure 14.

The control strategy of the peg-in-hole task; (a) the overall control structure; (b) the control flow chart.

During the peg-in-hole process, impedance control was utilized.^52,53 The system was modeled as a spring-damper mass system:

m {\ddot{θ}}_{d c} + b {\dot{θ}}_{d c} + k (t) θ_{d c} = τ

(23)

where

m, c, k, τ

represent mass, damping, spring, and external torque, and the state

θ_{d c} = θ_{d} - θ_{c}

describes the difference between the desired angle

θ_{d}

and the current angle

θ_{c}

. To yield a smoother response,

k (t)

was designed to be responsive to the torque difference,⁵²

e_{τ} = τ_{d} - τ

k (t) = k_{τ} e_{τ} θ_{d c}^{- 1} + k_{v} {\dot{e}}_{τ} θ_{d c}^{- 1}

(24)

After discretization and the rearrangement of terms,

θ_{d c}

could be represented as

θ_{d c} = (k_{τ} τ + k_{v} \frac{τ}{Δ t}) / (\frac{m}{Δ t^{2}} + \frac{b}{Δ t})

(25)

Then, the parameters (

m

, b,

k_{τ}

k_{v}

Δ t)

were selected based on the empirical manipulator.

After understanding the main three contact states of peg and understanding how to convert the force change into position control (impedance controller), then look at the detailed description of the control strategy of the peg-in-hole task in Figure 14. When starting to execute the strategy loop of the control strategy of the peg-in-hole task, the state of the manipulator (including the angular position of each joint, angular velocity, angular acceleration $θ_{1 \sim 6}$ , ${\dot{θ}}_{1 \sim 6}$ , ${\ddot{θ}}_{1 \sim 6}$ will be read from the manipulator first, and the spatial coordinates $P_{X}$ , $P_{Y}$ , $P_{Z}$ of the end-effect), of which $θ_{1 \sim 6}$ , ${\dot{θ}}_{1 \sim 6}$ , ${\ddot{θ}}_{1 \sim 6}$ are sent to the virtual force sensor gets $M_{X}$ , $M_{Y}$ , $F_{Z}$ three-axis force information. Then, according to the three-axis force information, it is determined which stage of the relationship between peg and hole is in Figure 13, and determines which “Manipulator motion” command should be sent to the manipulator. (In addition to the judgment of the relationship between peg and hole at this stage, it is also necessary to obtain the current arm position $θ_{1 \sim 6}$ , ${\dot{θ}}_{1 \sim 6}$ , ${\ddot{θ}}_{1 \sim 6}$ , $P_{X}$ , $P_{Y}$ , $P_{Z}$ and end-effect three-axis force information to determine how to adjust the posture and how the force should be applied.)

The peg-in-hole strategy, which utilizes information derived from the virtual force sensor, was experimentally evaluated. The depth and diameter of the hole were set to 30 and 21.8 mm, respectively. Figure 15 shows snapshots of the experiments using a peg 40 mm in length and 220 mm in diameter. The MAE of the force/torque values between the commercial sensor and virtual sensors is 0.378 Nm ( $M_{X}$ ), 0.242 Nm ( $M_{Y}$ ), and 9.438 N ( $F_{Z}$ ), respectively. Figure 16 shows the states of the manipulator during the peg-in-hole process using the more-tight-fit peg with a length of 48 mm and diameter of 21.4 mm. The time sequence shows that the peg approached the hole first, stuck there (i.e., a spike at 10 s), and then an adjustment was executed to alter its position and orientation (as Figure 13(b)) until the peg could successfully move forward until fully inserted. In the empirical system, real-time performance is mainly determined by communication speed between the manipulator and PC, as well as how long it takes for the model to predict the output. The former is 125 Hz, and the latter is about 63 Hz.

Figure 15.

Snapshots of the peg-in-hole experiment.

Figure 16.

The top 6 subfigures illustrate the displacement and direction of peg (the coordinate information is defined in Figure 1(b)). The bottom 3 subfigures display the force signal of the peg (the coordinates are defined in Figure 6(a)).

Speed optimization of trajectories

The optimized trajectories were first learned and simulated with the physics-data hybrid dynamic model (simulator) and then the experiment was conducted with the TM5-700 manipulator to verify the trajectory planner. The baseline trajectory was generated by selecting three trajectories as in Table 8 (Trajectory 1, 2, and 3). Each trajectory was planned with the simulator and the experimental results are in Table 8. The “Original” represents the elapsed time without speed optimization and the “Optimized” stands for the time with speed optimization. The speed optimization method successfully reduced the elapsed time by an average of 20% for three trajectories. To conclude, the motion planner developed in “Design of a motion planner” section is functional and succeeds in reducing the elapsed time of two selected trajectories.

Table 8.

Trajectories before and after speed optimization (unit:sec).

Time	Trajectory 1	Trajectory 2	Trajectory 3
Original	1.54	1.31	1.64
Optimized	1.16	1.11	1.29
Improvement	24.7%	15.3%	21.3%

Conclusion and future work

In this article, we reported on the development of a virtual force sensor and a motion planner of a manipulator based on the physics-data hybrid dynamic model. The hybrid model has the best accuracy compared to the physics-based model and requires less training data compared to the data-driven model. Furthermore, the modeling results reveal that among the tested DNN-based, LSTM, and XGBoost architecture with hyperparameter optimization, XGBoost performs the most accurate modeling of the manipulator dynamics including un-modeled dynamics.

The external torque of the manipulator is then derived by subtracting the derived internal torque from the total motor torque. The external torque is further transformed into a three-axis virtual force/torque on the end effector through a geometrical relation and machine learning technique. Finally, the virtual sensor is utilized in two applications. For wiping the table task with a designated normal force of 60 N, the average error and standard deviation of the errors between the estimated and measured forces are 7.995 and 5.843 N, respectively. For the peg-in-hole tasks, a peg 48 mm in length and 21.4 mm in diameter is able to be plugged into a hole 30 mm in depth and 21.8 mm in diameter. The MAE of the force/torque values between the commercial sensor and the virtual sensors is 0.378 Nm ( $M_{X}$ ), 0.242 Nm ( $M_{Y}$ ), and 9.438 N ( $F_{Z}$ ), respectively. Lastly, the learning-based motion planner successfully plans time-efficient trajectories for the manipulator. Three trajectories are tested and their elapsed time is reduced by an average of 20.4%. The research results imply the potential to be implemented in industrial production lines.

We are in the process of refining the model so that it can compensate for the dead zone or friction effector of the manipulator more accurately. The model will be evaluated using different states as well. Furthermore, we plan to design an advanced controller and sim-to-real skill transfer for the manipulator based on the developed hybrid dynamic model (digital twin).

Footnotes

Authors’ contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Jyun-Ming Liao and Wu-Te Yang. The first draft of the article was written by Jyun-Ming Liao, and it was revised by Wu-Te Yang. The final article was written by Pei-Chun Lin. Pei-Chun Lin also acquires funding and supervises and manages the project. All authors read and approved the final article.

Preprint acknowledgement

The authors acknowledge this manuscript has been submitted to a preprint server and the link is accessed at .

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Science and Technology Council (NSTC), Taiwan, under contract: MOST 110-2634-F-007-027- and MOST 111-2634-F-007-010-.

ORCID iDs

Wu-Te Yang

Pei-Chun Lin

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Kuo

CFJ

Weng

. An integrated curvature surface inspection and prediction system for 5-axis synchronization machining. Int J Adv Manuf Technol 2021; 115: 3873–3886.

Huang

Kuo

, et al. Development of an intelligent grinding system for fabricating aspheric glass lenses. Int J Adv Manuf Technol 2020; 111: 1351–1359.

Lai

Lin

. Real-time surface roughness estimation and automatic regrinding of ground workpieces using a data-driven model and grinding force inputs. Int J Adv Manuf Technol 2024; 132: 925–941.

Lien

JJJ

. Robot arm grasping using learning-based template matching and self-rotation learning network. Int J Adv Manuf Technol 2022; 121: 1915–1926.

Berkay

Demir

Mistikoglu

. Recent developments in computer vision and artificial intelligence aided intelligent robotic welding applications. Int J Adv Manuf Technol 2023; 126: 4763–4809.

Chen

Yang

Chen

, et al. Manipulator trajectory optimization using reinforcement learning on a reduced-order dynamic model with deep neural network compensation. Machines 2023; 11: 350.

Russell

Norvig

. Artificial intelligence: a modern approach. 3rd ed. Upper Saddle River, NJ: Pearson, 2009.

Copeland

. What’s the difference between artificial intelligence, machine learning and deep learning? https://blogs.nvidia.com/blog/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/ (accessed 16 July 2020).

Bianchini

Scarselli

. On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learn Syst 2014; 25: 1553–1565.

10.

Lipton

Kale

Elkan

, et al. Learning to diagnose with LSTM recurrent neural networks, https://arxiv.org/pdf/1511.03677.pdf (accessed 16 July 2020).

11.

Meng

Finley

, et al. Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 2017; 30: 3146–3154.

12.

Gillespie

Best

Townsend

, et al. Learning nonlinear dynamic models of soft robots for model predictive control with neural networks. In: 2018 IEEE international conference on soft robotics (RoboSoft). IEEE, 2018, pp.39–45.

13.

Boucetta

Abdelkrim

. Neural network modeling of a flexible manipulator robot. In: Computer information systems and industrial management: 11th IFIP TC 8 international conference (CISIM), 2012, pp.395–4045.

14.

Lai

Liu

Zhang

, et al. Fuzzy adaptive inverse compensation method to tracking control of uncertain nonlinear systems with generalized actuator dead zone. IEEE Trans Fuzzy Syst 2016; 25: 191–204.

15.

Selvin

Vinayakumar

Gopalakrishnan

, et al. Stock price prediction using LSTM, RNN and CNN-sliding window model. In: International conference on advances in computing, communications and informatics (ICACCI). IEEE, 2017, pp.1643–1647.

16.

Chen

Guestrin

. Xgboost: a scalable tree boosting system. In: ACM SigKDD international conference on knowledge discovery and data mining, 2016, pp.785–794.

17.

Jiaramaneepinit

Nuthong

. Application of neural networks for vehicle classifiers: extreme learning machine approach. In: International conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON), 2018, pp.241–244.

18.

CHG

. Deep learning approaches for improving robustness in real-time 3D-object positioning and manipulation in severe lighting conditions. Int J Adv Manuf Technol 2023; 192: 3829–3847.

19.

Erkol

Bailey

Palardy

, et al. Predicting composite laminates roughness: data-driven modeling approaches using force sensor data from robotic manipulators. Int J Adv Manuf Technol 2023; 128: 1801–1813.

20.

Villani

Pini

Leali

, et al. Survey on human-robot interaction for robot programming in industrial applications. IFAC-PapersOnline 2018; 51: 66–71.

21.

Sherwani

Asad

Ibrahim

BSKK.

Collaborative robots and industrial revolution 4.0 (IR 4.0). In: International conference on emerging trends in smart technologies (ICETST), 2020, pp.1–5.

22.

Lin

. Development of an intelligent transformer insertion system using a robot arm. Robot Comput-Int Manuf 2018; 51: 209–221.

23.

Kim

Song

. Hole detection algorithm for square peg-in-hole using force-based shape recognition. In: IEEE international conference on automation science and engineering (CASE), 2012, pp.1074–1079.

24.

Tang

Lin

Zhao

, et al. Autonomous alignment of peg and hole by force/torque measurement for robotic assembly. In: IEEE international conference on automation science and engineering (CASE), 2016, pp.162–167.

25.

Qin

Wang

Yuan

, et al. Multi-sensor perception strategy to enhance autonomy of robotic operation for uncertain peg-in-hole task. Sensors 2021; 21: 3818.

26.

Beltran-Hernandez

Ramirez-Alpizar

DPIG

Harada

. Variable compliance control for robotic peg-in-hole assembly: a deep-reinforcement-learning approach. Appl Sci 2020; 10: 6923.

27.

Park

Lee

, et al. Compliance-based robotic peg-in-hole assembly strategy without force feedback. IEEE Trans Ind Electron 2017; 64: 6299–6309.

28.

Yen

Tang

Lin

, et al. Development of a virtual force sensor for a low-cost collaborative robot and applications to safety control. Sensors 2019; 19: 2603.

29.

. A virtual sensor for collision detection and distinction with conventional industrial robots. Sensors 2019; 19: 2368.

30.

Hwang

Minami

Ishikawa

. Virtual torque sensor for low-cost RC servo motors based on dynamic system identification utilizing parametric constraints. Sensors 2004; 18: 455–470.

31.

Craig

. Introduction to robotics: mechanics and control. 3rd ed. Upper Saddle River, NJ: Pearson, 2005.

32.

Tian

Collins

. An effective robot trajectory planning method using a genetic algorithm. Mechatronics 2018; 14: 3856.

33.

Števo

Sekaj

Dekan

. Optimization of robotic arm trajectory using genetic algorithm. Mechatronics 2014; 47: 1748–1753.

34.

Kober

Bagnell

Peters

. Reinforcement learning in robotics: a survey. Int J Robot Res 2013; 32: 1238–1274.

35.

Schulman

Wolski

Dhariwal

, et al. Proximal policy optimization algorithms, https://arxiv.org/abs/1707.06347 (accessed 16 July 2020).

36.

Sutton

Barto

. Reinforcement learning: an introduction. 2nd ed. Cambridge, MA: MIT Press, 2018.

37.

Reinhart

Shareef

Steil

. Hybrid analytical and data-driven modeling for feed-forward robot control. Sensors 2017; 17: 311.

38.

Carron

Arcari

Wermelinger

, et al. Data-driven model predictive control for trajectory tracking with a robotic arm. IEEE Robot Automat Lett 2019; 4: 3758–3765.

39.

Yuan

, et al. Data-driven dynamic modeling for a swimming robotic fish. IEEE Trans Ind Electron 2016; 63: 5632–5640.

40.

Zhu

Xiong

, et al. Data-driven dynamics modeling and control strategy for a planar n-DOF cable-driven parallel robot driven by n+ 1 cables allowing collisions. J Mech Robot 2024; 16: 1–15.

41.

Keras. https://keras.io/ (accessed 16 July 2020).

42.

XGBoost. https://xgboost.readthedocs.io/en/latest/gpu/ (accessed 16 July 2020).

43.

Claesen

Moor

. Hyperparameter search in machine learning, https://arxiv.org/abs/1502.02127 (accessed 16 July 2020).

44.

Snoek

Larochelle

Adams

. Practical Bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst 2012;25.

45.

Bergstra

Bengio

. Random search for hyper-parameter optimization. J Mach Learn Res 2012; 13: 281–305.

46.

Calandra

Seyfarth

Peters

, et al. An experimental comparison of Bayesian optimization for bipedal locomotion. In: IEEE international conference on robotics and automation (ICRA). IEEE, 2014, pp.1951–1958.

47.

scikit-learn. https://scikit-learn.org/ (accessed 16 July 2020).

48.

Kappler

Meier

Ratliff

, et al. A new data source for inverse dynamics learning. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2017, pp.4723–4730.

49.

Gottschalk

Lin

Manocha

OBBTree: a hierarchical structure for rapid interference detection. In: Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996, pp.171–180.

50.

Huynh

Separating axis theorem for oriented bounding boxes. 2009.

51.

, et al. Kalman filter and its application. In: 8th international conference on intelligent networks and intelligent systems (ICINIS), 2015, pp.74–77.

52.

Lee

Buss

. Force tracking impedance control with variable target stiffness. IFAC Proc 2008; 41: 6751–6756.

53.

Lakshminarayanan

Kana

Mohan

, et al. An adaptive framework for robotic polishing based on impedance control. Int J Adv Manuf Technol 2021; 112: 401–417.

Physics-data hybrid dynamic model of a multi-axis manipulator for sensorless dexterous manipulation and high-performance motion planning

Abstract

Keywords

Introduction

Dynamic models

The physics-based models

The data-driven models

The physics-data hybrid models

Motion data generation of the manipulator

Performance of the dynamic models

Performance comparison of physics-based, data-driven, and hybrid dynamic models

Architecture and hyperparameter adjustment of the data-driven models

Design of a virtual force sensor

The external torque observer

Virtual force sensor

Data-driven models for the 4th and 5th joints of the manipulator

Design of a motion planner

Experiment

Wiping the table

Peg-in-hole

Speed optimization of trajectories

Conclusion and future work

Footnotes

Authors’ contributions

Preprint acknowledgement

Declaration of conflicting interests

Funding

ORCID iDs

Data availability

References