Abstract
Identification of robotic systems with hysteresis is the main focus of this article. Nonlinear AutoRegressive eXogenous input models are proposed to describe the systems with hysteresis, with no limitation on the nonlinear characteristics. The article introduces an efficient approach to select model terms. This selection process is achieved using an orthogonal forward regression based on the leave-one-out cross-validation. A sampling rate reduction procedure is proposed to be incorporated into the term selection process. Two simulation examples corresponding to two typical hysteresis phenomena and one experimental example are finally presented to illustrate the applicability and effectiveness of the proposed approach.
Introduction
Hysteresis, a memory-dependent, multivalued relation between input and output, is often observed in many robotic systems. The system may exhibit a path-dependent pattern, where multiple outputs are associated with increasing or decreasing but the same input and form a loop under cyclic excitation. It exists in many applications, such as actuators and sensors involving smart materials (e.g. piezoelectrics 1,2 and magnetostrictive materials 3,4 ) which possess the property of hysteresis in the reaction, and some special robotic systems with hysteretic dynamics like aerial vehicles. 5 The control of these robots is difficult due to the presence of the high nonlinearity. Such nonlinearity turns to be a limitation of open-loop operations in high-precision applications, results in instabilities in closed-loop operation, and degrades the tracking performance even with the use of feedback control in tracking control application. 6,7 It presents challenges in both analysis and controller design of robotic systems with hysteresis. A mathematical model, therefore, is required to predict and control the behavior of the robotic systems containing hysteresis.
Modeling of hysteresis has in recent years attracted increasing attention in various areas of robotics research, such as friction compensation, control of rubber tube actuator, elastic robot joints, and so on. Many researchers have studied this phenomenon, and many mathematical models have been developed to grasp the dynamic features of hysteresis phenomenon. 8 –11 Two of the most popular models are explained in detail in the following sections.
Preisach model
It was originally developed in the 1930s for magnetic hysteresis and is widely used to describe the hysteresis characteristics of smart materials 12 –14 in recent years. It has attracted considerable interest. Preisach model is a weighted superposition of simple independent delayed relays γαβ[u] with α, β corresponding to up and down switching values, respectively. 15 This model can be mathematically expressed as
where y(t) and u(t) are the output and input at time t, respectively. ρ(α, β) is known as Preisach density (distribution) function, which weights the single relay units in the α, β - plane and defines the shape of hysteresis curve. The model is regarded as a general description of hysteresis phenomenon with two properties: the casual property and rate-independent property. That is, the output of the model has no relation with future inputs and does not depend on derivatives of the input.
Bouc–Wen model
It was originally proposed by Bouc (1967) and subsequently generalized by Wen (1976). The model has been widely used in the field of structural engineering, which provides reasonable accuracy in the deterministic and stochastic dynamic analyses. The Bouc–Wen model is formulated from mathematical analysis of the characteristic response properties. It is given by the following differential equation
with
Since the nonlinearity of hysteresis may show totally different properties for robotics of different areas, it is impossible to find one accurate model suitable for all types of robotic systems with hysteresis, which not only exists in robot sensors with smart materials but also robot dynamics. The hysteresis can be generally classified into two major categories: static hysteresis and dynamic hysteresis. The former is rate independent, which means there is no correlation between the behavior and the variation rate of input, like the phenomenon corresponding to the Preisach model; while the latter is rate dependent, which depends on the variation rate of input, like the features captured by the Bouc–Wen model.
The purpose of this article is to propose a generalized model with a clear structure in some straightforward way, which can be used to model robotic systems with hysteresis with no limitations on the nonlinear characteristics. The existing models are mostly continuous-time models, 10,12,15 and the mathematical expressions are complicated, such as (1) and (2). One major shortcoming to the use of the continuous-time models is the difficulty of controller design due to the complicated form of the robotic model. Another shortcoming is lack of generalization capability. The nonlinearity and mechanism of hysteresis is different for robotics of different areas. The continuous-time models are given based on parameters with physical meaning, which analyze the mechanism for hysteresis of the particular robot. In addition, continuous-time model is not always available after theoretical analysis of the complex system. Then, discrete-time models can be considered. In practical application, discrete-time observations are obtained and the identification technique is realized by digital computer, so discrete-time models are more convenient in the identification procedure. The authors come up with the idea to use the discrete-time Nonlinear AutoRegressive with eXogenous input (NARX) model to describe the input–output relationship of robotic systems with hysteresis. The NARX model can exhibit a wide range of nonlinear behaviors with different properties such as chaos and bifurcations 19 and can be easily identified. The model structure is constructed in a linear-in-parameter form, which solves the difficulty of controller design.
The article is organized as follows: The NARX model is reviewed in the second section, and analysis for modeling robotic systems with hysteresis is made afterward. The third section introduces the orthogonal forward regression (OFR) algorithm for term selection. The fourth section is devoted to show the limitations of the OFR algorithm, and then an improved model selection procedure with new criteria is developed to solve the problems. Typical simulation examples are discussed in the fifth section, together with the detailed derivations and performance analysis. Experimental example of unmanned aerial vehicle is given in the sixth section. Finally, some concluding remarks are drawn and the limitations of potentially applying the polynomial NARX model to hysteresis identification are discussed in the seventh section. For the sake of easy implementation of digital computers, all the signal processing derived here are based on the discrete-time case.
The NARX model
The NARX representation has attracted considerable interest in modeling nonlinear systems, and many relevant analysis tools and identification algorithms have been developed in recent years. 20 –23 The NARX model is an extension of the linear ARX model. The AR model is used when current output is dependent only on the previous outputs, and the ARX model is used when there is exogenous input given to the AR model, as shown in Figure 1.

The ARX model.
The NARX model is defined as
where y[n] and u[n] are the output and input of the system, respectively; ny
and nu
are the maximum lags for system output and input, respectively; f(⋅) is a nonlinear function which needs to be identified from given observed data; e[n] is the prediction error, which is thought to be a zero mean noise sequence when f(⋅) gives the reasonable description of the nonlinear relation between the output and input. As mentioned in the Introduction, hysteresis is a multivalued and nonsmooth relation between input and output; however, when the input is expanded from u[n] to
The smooth single-valued function f(⋅) is often constructed by a linear-in-parameter form, using a variety of basic functions
where
where l is the degree of the polynomial model defined as the maximum order of model terms. Corresponding to (4), φ(⋅) is thus of the form
Most nonlinear models (e.g. the Volterra series model) only consider the input data to the system as the input to the model. The amount of model terms is sometimes very large; otherwise, the model cannot guarantee the approximation accuracy. The NARX model, however, adds the historical output data to the input variables of the model. The model terms decrease significantly, because the historical output information contains some nonlinear characteristics of the system. So, the NARX model can give a simpler structure for the complex systems. In continuous-time models (1) and (2), integral and differential calculus is needed which adds noise to the identification procedure, as a result algorithm complexity is increased to ensure the accuracy. The NARX model avoids the problem. The discrete-time data can be used directly to approximate the continuous-time system with good performance. The NARX model decomposes the system into a linear segment and a nonlinear segment. The linear segment represents the influence of historical data which deals with problem of rate dependence, while the nonlinear segment (e.g. polynomial form) corresponds to the static nonlinear mapping. The clear structure makes it easier for control design.
OFR algorithm
From (6), it can be easily found out that the total number of terms increases rapidly with the maximum lag
The ratio provides an effective means to measure the dependency between the output and each term of the model. Then the significant terms can be selected out gradually based on the OFR algorithm. The procedure is described as follows.
Step 1. Select out the term with the largest ERR.
Then the first significant term can be selected as
Step j. Let
In the first step,
Then the jth significant term can be selected as
The procedure terminates at the Mth step. M is determined by the terminating condition. For simple situations, it can be given as
where ERR
i
equals
Extension to the OFR algorithm
Limitations of the OFR identification approach
As shown in the third section, the form only considers the tolerance in the identification procedure when all the data are used for fitting. The model usually shows a bad performance when used on future data sets. 27 However, in order to be generally used, a model must have good extrapolation properties. It is necessary to split the data into two subsamples: a fitting sample and a validation sample. The fitting sample is used to measure the significance of model terms, while the validation sample is used to terminate the procedure. The fitting sample must contain the key characteristics of the robotic system to ensure that the model identified using the data has the ability to perform the hysteretic behavior with expected properties.
Since the NARX model takes the output history as part of variables in the model, it results in some drawbacks. The model works only when the historical outputs of the robotic system are available. If the output of the robotic system is not measurable in the procedure, the model terms that contain the output need to be calculated by the estimated result of the previous steps, then the system error will be accumulated, leading to inaccuracy in the model.
The data used for identification procedure is obtained from the continuous-time system through periodic sampling
where uc
(t), yc
(t) are the continuous-time input and output signals, respectively, and T is the sampling period. T has a tremendous influence on the effectiveness of the final model. The coefficients of the overall model are dependent on the choice of the sampling rate. Practical experience has shown that the mainly influence of sampling rate on modeling is the coefficients of model terms but not the model structure apart from the effects of oversampling.
28
Data oversampling will bring numerical problems for nonlinear model structure selection. When the T is chosen to be too small, the fitting data will be so intensive that
Then the model finally identified is always a combination of expressions in (13) with a random rule. As a result of that, the model only displays the linearized characteristics at every point, incapable of representing the nonlinear properties of robotic systems with hysteresis. Even if the final model doesn’t end up in the linear form for Ts is small, the ERR of term y[n − 1] is close to 1, while the terms determined by input history, like u[n − 1], u[n − 2], have extremely low ERRs, which will lead to wrong results in the following steps of the OFR procedure after y[n − 1] is selected, since the precision of orthogonalization is influenced profoundly by the noise of data, owing to the strong correlation between model terms. The sampling period, however, cannot be large, in case some important nonlinear information is missed resulting in low approximation accuracy to continuous-time system, especially when rapid change exists in the robotic system.
The PRESS statistic
A commonly used criterion in data splitting is the prediction error sum of squares (PRESS).
29
The procedure of leave-one-out cross validation goes like this: remove one observation at a time, use the removed observation as validation point and the remaining N − 1 observations as fitting sample, then estimate the coefficients and evaluate the deleted response
where
where hii represents the prediction variance, given by
An important property of hii can be easily derived as
When N >> n, 30
This approximation significantly reduces computational complexity of PRESS.
Considering the problems for model structure selection, a model identified using a finite data set may not have good performance over the fitting data, so the measure of model accuracy has to depend on an additional data set. The cross-validation way helps a lot for the model generalization when substituting the terminating condition of expression (11) with the PRESS statistic in the ORF algorithm. Moreover, the estimated function
Sampling rate reduction
Following the discussion in “The PRESS statistic” section, the sampling rate must be determined under full consideration of the dynamic characteristics of the data to guarantee that the OFR procedure can capture the main nonlinear effects of the robotic system. When the sampling period is too small, the nonlinear effects will appear to be local linearization, and the OFR procedure will be disturbed by the form of (13). So in situations of very small sampling period, the authors suggest using the improved algorithm followed by sampling rate reduction (SRR) procedure. A decimator in Figure 2, that is, a system with a low-pass filter followed by compression, is required for SRR by integer factor L.

Decimator for sampling rate reduction by L.
The maximum relative deviation (MRD) and the maximum relative error (MRE) are is proposed as the criteria for choice of L in this article. For the new discrete-time data after decimation, with the new sampling period of T′ (=LT), MRD is given as
where y′[k] = y[Lk] is the output of the new discrete-time data. Then the inferior limit of L can be determined by
where 0 < α < 1 is the expected confidence level. If MRD doesn’t satisfy the inequality, the output y′[n] can be directly derived from the past two points y′[n − 1] and y′[n − 2], that is, the linear relationship in (13) can already reach the requirement of modeling accuracy. The nonlinear information, thus, is not necessary to be contained in the model. However, models which capture little nonlinear characteristics cannot present the nonlinear properties of the robotic system and, certainly, have no significance for controller design in practice.
Then the procedure with SRR is given. First, evaluate the sampling rate 1/T of the original data set D 0 by the value of MRD. If the value is smaller than 1 − α or very close to it, we may use the following procedure to handle the oversampling data. The data rate is reduced by increasing integer factor Li (Li = 2, 3, 4,…) using a decimator, and the value of MRD i for each new data set D i with the data rate 1/Ti (Ti = LiT) is calculated. Note that the MRD i value is getting bigger as Li increases, which means local linear regression has less influence on model fitting, whereas some important nonlinear characteristics may be abandoned in the remaining data by the decimator when Li increases to some extent. So the authors propose the MRE to measure the importance of the information missed due to decimation by Li , defined as
where y[k] is the output of original data set D 0. The superior limit of decimation factor L is determined by
where 0 < α < 1 is the expected confidence level. We can choose an appropriate value of L based on the two restrictions in (20) and (22). Then the sampling rate is adjusted to 1/(LT). The model structure is finally selected by the OFR algorithm with the PRESS statistic following this SRR procedure.
Simulation studies
This section investigates the efficiency and performance of the polynomial NARX model for the identification of robotic systems with hysteresis, by applying the OFR algorithm with the PRESS statistic following SRR procedure to two typical examples. The first example is a static hysteresis, given by a simulated Preisach model, while the latter is a dynamic hysteresis, given by a simulated Bouc–Wen model.
A simulated Preisach model
Consider a Preisach model described by the expression as follows
where

Elements for model in (23): (a)
The model terms φ(⋅) in (4) are chosen to be polynomial functions in (6) determined by the following element:
Evaluation for L in the SRR procedure.
SRR: sampling rate reduction; MRD: maximum relative deviation; MRE: the maximum relative error.
The confidence level is usually expected to be 95% in practice, corresponding to α in expressions (20) and (22), then the decimation factor L can be set to be 11, 12, and 13 according to the results of the SRR procedure. The PRESS values (Table 2), by the OFR algorithm, with different L, over the data set of different size accordingly, are calculated for different model length n.
PRESS values for different model length.
PRESS: prediction error sum of squares.
The PRESS statistic suggests choosing 8 model terms over training data with L = 11 in the SRR procedure, 8 or 9 model terms for L = 12, and 9 model terms for L = 13, and the model terms selected out are almost the same as shown in Table 3.
The term selection results based on the data with different L.
The effect of data rate in modeling the robotic systems with hysteresis is sufficiently shown in the results in Table 3. The parameters in the model, but not the model structure, have the strong dependence on the sampling rate. The process also verifies the significance of the SRR procedure and finally shows a good performance for modeling the simulated Preisach model using a polynomial NARX model, which proves that the proposed method is effective for modeling robotic systems with static hysteresis like magnetostrictive actuators.
A simulated Bouc–Wen model
Consider a Bouc–Wen model described in (2), where

The PRESS statistic versus the model length. PRESS: prediction error sum of squares
It is clear that the model length is best to be 11. The model by choosing 11 model terms is finally identified as
The model structure identified by the OFR algorithm based on the PRESS statistic is in agreement with traditional modeling assumptions. Figure 5 displays the performance of (24) over the test data set consisting of 120 points. The continuous line is the real values from the simulated Bouc–Wen model, while the dashed line is the estimated output from the NARX model in (24). The relative error of each point is calculated to test the model validity, and the results are given in Figure 6, where the two horizontal lines indicate the desired error tolerance of 5%. The jumping phenomena can be ignored directly, because it is the result of zero values of y[n] in the denominator. Clearly, the model validity tests are well satisfied.

Model performance on the test data.

Model validity tests for (24).
Experimental result
This section aims at illustrating the effectiveness of the NARX model and the accuracy of above explained identification method. For this purpose, we employ an unmanned aerial vehicle for research. The mass and geometric characteristics are shown in Table 4.
Mass and geometric characteristics of the model.
In general situations of steady flights with small angle of attack, the models for aerodynamic forces and moments are linear to the state variables. However, for the flights with large angles of attack and sideslip or high angular rates, the forces and moments demonstrate hysteresis effects. The test data of longitudinal large-amplitude sinusoidal motions is investigated. Preisach model cannot apply to this situation, because the hysteresis is relevant to the angular rate. Bouc–Wen model is also inappropriate, because the differential operator on aerodynamic forces and moments will lead to an accumulation of errors. In the study by Greenwell, 5 a reduced-frequency model is discussed, which was proposed for modeling the hysteresis nonlinearity in aerial vehicle. Comparison of the NARX model and the reduced-frequency model is shown in Figure 7 based on the test data of the normal force coefficient CN and the angle of attack α with the frequency of 0.4 Hz. The NARX model and the proposed identification approach can be applied to the unmanned aerial vehicle with a high precision.

Comparison result.
Conclusions
The polynomial NARX model has been considered for modeling robotic systems with hysteresis. The model term selection problem has been investigated for using polynomial NARX models. A critical analysis of the standard OFR algorithm has shown some limitations, particularly when the sampling rate is high. The terminating condition for the OFR algorithm has been modified. The sampling rates over a reasonable range affect the parameter estimates but not affect the model structure. A SRR procedure is proposed based on calculating the MRD and MRE. The applicability and effectiveness of the SRR procedure for term selection have been demonstrated by two simulated examples and one experimental example. Another important property of using polynomial NARX models for hysteresis identification is that few assumption or priori knowledge about the robotic system is needed. When the robotic system is appeared to have a complex noise model, the NARX models should be extended to the nonlinear autoregressive moving average with exogenous variables models.
As a final remark, it should be pointed out that the proposed modeling approach for robotic systems containing hysteresis is not viable for the situation that the previous output data are not available in the process. One solution to this problem is to use the estimated values from the previous steps. However, the solution will cause a stepwise accumulation of errors, owing to the sensitivity of the iteration process to initial condition, so a compensation measure is needed to guarantee the accuracy.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China (grant nos 61673240 and 61603210) and the National Program on Key Basic Research Project (grant no 2012613189).
