Sage Journals: Discover world-class research

Abstract

Trajectory tracking is crucial for intelligent tracked vehicles, with model predictive control (MPC) being a widely used method due to its ability to handle predictions and constraints. However, conventional MPC based on kinematic models neglects the coupled longitudinal–lateral dynamics, leading to limited accuracy and stability. To address this, we propose an MPC strategy that integrates both kinematic and dynamic models for dual-motor-driven tracked vehicles. This approach uses lateral deviation, heading deviation, longitudinal velocity, and yaw rate as state variables, with motor torques as control inputs, explicitly capturing dynamic coupling and electric drive characteristics. Additionally, we introduce a deep Q-network (DQN)-based adaptive weight adjustment scheme to improve disturbance rejection and overcome the limitations of fixed MPC weights. This adaptive mechanism optimizes the weight matrix online under varying operating conditions. The proposed method is validated through MATLAB/Simulink–RecurDyn co-simulation and low-speed vehicle tests, demonstrating significant improvements in trajectory tracking, with reductions in lateral and heading deviations by 58.4% and 18.6%, respectively. Further, the DQN-based adaptive weighting leads to additional improvements, reducing lateral and heading deviations by 36.6% and 19.7%.

Keywords

dual motor driven tracked vehicle trajectory tracking handling stability dynamics prediction model MPC

Introduction

Tracked vehicles, with their superior off-road capability, high load capacity, low ground pressure, and zero-radius turning, are widely deployed in military, agricultural, engineering rescue, and emergency response applications.¹ Compared with wheeled vehicles, dual-motor independently driven tracked vehicles differ significantly in drivetrain structure and steering mechanism: steering is achieved by generating speed or torque differences between the two tracks. Under real-world operating conditions, they exhibit pronounced longitudinal–lateral dynamic coupling, complex track–ground interactions, and high slip ratios, posing considerable challenges to high-precision trajectory tracking and maneuvering stability control.^2,3

The methods of trajectory tracking control mainly include pure tracking algorithm,⁴ feedforward-feedback control,⁵ PID control,^6,7 linear quadratic form (LQR), and model predictive control (MPC). In recent years, machine learning related algorithms have gradually been applied in the field of trajectory tracking control.⁸ The pure tracking algorithm is based on the simplified steering kinematics model for trajectory tracking. It has small amount of calculation, can be combined with feedforward-feedback control, and has good trajectory tracking performance for low speed driving on smooth roads surfaces. The PID algorithm is simple, its parameters are easy to adjust, and it has strong robustness. It is only applicable to the decoupled linear time-invariant system, and has certain limitations in the application of nonlinear and high real-time system. The vehicle model in LQR is a linear time invariant system, and the control effect depends on an accurate mathematical model. It cannot guarantee robustness and stability under the condition of time-varying parameters or interference, and the trajectory tracking effect on sections with sudden curvature changes is poor.⁹ Model predictive control (MPC) based on vehicle model can effectively solve multi-objective, multi constraint and multivariable optimization problems, with rolling optimization and feedback correction functions, can effectively reduce or even eliminate the impact of closed-loop system time-delay problems.^10–12 At present, the MPC is mostly used in intelligent wheeled vehicles to achieve trajectory tracking. Due to the significant differences in the transmission system structure between dual motor driven tracked vehicles and wheeled vehicles. Moreover, the tracked vehicle achieve steering by changing the speed or torque of the tracks on both sides, so the steering principle of tracked vehicles is very different from that of wheeled vehicles. The dynamics system of the tracked vehicle is a nonlinear time-varying system in both lateral and longitudinal directions. Therefore, the MPC is very suitable for the trajectory tracking control of tracked vehicles. At present, compared with wheeled vehicles, there is relatively little research on MPC trajectory tracking control for tracked vehicles.

The core content of MPC includes predictive model, rolling optimization, and feedback correction. The accuracy and complexity of the prediction model will affect the computational efficiency and trajectory tracking accuracy of tracked vehicles. In Li¹³ and Zhou,¹⁴ a trajectory tracking MPC controller based on the kinematics prediction model of the tracked vehicle is designed to track the trajectory by controlling the position deviation and heading angle deviation. Among them, the speeds of the left track and right track are the control variables. In Burke,¹⁵ a kinematics prediction model without considering slip is proposed. The designed MPC controller uses track’s speed and the yaw rate as control variables to track vehicle position and heading angle. In Hu et al.,¹⁶ considering the effect of slip, the kinematics model of the tracked vehicle based on the instantaneous steering center is used as the prediction model, and the winding speeds of the left and right tracks are used as the control variables to achieve trajectory tracking, and the constraint expressions of the control variables and the state variables are given. Due to the lack of consideration for dynamics constraints, the trajectory tracking deviation in the above studies is significant. The prediction model in Tang et al.,¹⁷ Zhao et al.,¹⁸ and Lu et al.¹⁹ is the same as that in Hu et al.¹⁶ However, in Tang et al.,¹⁷ a dynamics analysis is conducted, the constraint values of the control variable for track winding speed are added, and the constraint of vehicle acceleration considering ground adhesion conditions is also added. The limit values of the speed control increment are defined, resulting in higher control accuracy for trajectory tracking. In Lu et al.,¹⁹ steering dynamics analysis was conducted. In order to ensure that tracked vehicle do not roll over, the speed constraints are set, and the longitudinal offset of the steering center is constrained to ensure that steering does not lose control, solving the problem of low trajectory tracking accuracy caused by the steering slip of the dual motor driven high-speed tracked vehicle in off-road condition. On the other hand, considering only the constraints of longitudinal velocity and acceleration cannot guarantee the lateral tracking accuracy of the tracked vehicle under many working conditions.

Recent studies have incorporated higher-fidelity models into MPC. Zhang et al.²⁰ established a dynamic model of dual-motor-driven tracked vehicles that explicitly accounts for track–ground interaction and motor torque coupling, while Chen et al.²¹ proposed a dynamic MPC framework capable of handling coupled dynamics and terrain uncertainty. Similarly, Elsharkawy et al.²² analyzed the dynamic response of tracked vehicles under slope and nonuniform turning conditions, highlighting the necessity of accurate coupled modeling. These works confirm that dynamic-model-based MPC can significantly improve tracking performance under challenging operating conditions. In this paper, the MPC trajectory tracking control method that integrates the kinematics and dynamics prediction models will be proposed.

Another challenge is that MPC cost function weights are typically set empirically and remain fixed, which hinders adaptability across varying speeds, path curvatures, and adhesion levels. Adaptive methods such as fuzzy tuning or Bayesian optimization have been introduced,^23,24 but they either increase online computational burden or rely heavily on offline data with limited robustness. the current online parameter selection method for the MPC controller will to some extent increase the online calculation workload of the MPC, and some offline parameter optimization methods require a large amount of data information and cannot adapt to changing operating conditions. Reinforcement learning has significant advantages in online parameter optimization and has become a development trend at present.^25,26

Motivated by these challenges, this paper proposes a trajectory-tracking strategy that integrates a dynamic prediction model with MPC and introduces a deep Q-network (DQN)-based mechanism for online weight adaptation. The contributions are summarized as follows:

We develop a dynamic-model-based MPC framework for dual-motor-driven tracked vehicles, in which lateral deviation, yaw deviation, longitudinal velocity, and yaw rate are selected as state variables and track-side motor torques are used as control inputs. This enables the controller to explicitly account for longitudinal–lateral coupling and electric drive dynamics in both prediction and optimization.

We design a DQN-based adaptive weight adjustment scheme for the MPC cost function. Instead of relying on fixed or heuristically tuned weights, the proposed scheme learns an online mapping from vehicle states and reference information to the optimal weight matrix, thereby improving tracking accuracy and robustness under different speeds, curvatures, and adhesion levels.

The proposed approach is validated through MATLAB/Simulink–RecurDyn co-simulation and low-speed experiments, showing that dynamic-model-based MPC significantly outperforms kinematic MPC in terms of maximum lateral and yaw deviations, while DQN-based adaptation further enhances robustness and tracking performance.

Kinematics and dynamics modeling of tracked unmanned vehicle driven by dual motors

The electronic differential steering is adopted in a dual motor driven tracked vehicle, eliminating the need for mechanical or hydraulic steering control mechanism. By controlling the speed or torque of the motors on both sides, then the steering mode and mechanism are changed. In order to propose an effective model predictive trajectory control strategy, it is necessary to reconstruct the time-varying nonlinear kinematics and dynamics models of electronic differential steering. The schematic diagram of the chassis structure of the dual motor driven tracked vehicle is shown in Figure 1.

Figure 1.

Schematic diagram of the chassis structure of a dual motor driven tracked vehicle.

Steering kinematics analysis

As shown in Figure 2, the tracked vehicle may experience lateral sliding during the steering process, especially at high speeds (with speeds >0.8 rad/s), resulting in inconsistent speed direction with the body axis. In Figure 2, XOY is the global coordinate system, φ is the heading angle, β is the center of mass sideslip angle, o is the instantaneous turning center of the vehicle, v is the vehicle centroid velocity, R is the turning radius of the vehicle, and ω is the angular velocity, v_L is the axial linear speed at the outer track, n_L is the driving wheel speed, v_R is the axial linear speed of the inner track, n_R is the driving wheel speed; B is the center distance between the two tracks, L is the grounding length between the track and the ground, and r is the radius of the driving wheel.

Figure 2.

Kinematics analysis of tracked vehicles.

(O–XY) denotes the global coordinate frame, ψ is the heading angle, β is the sideslip angle at the vehicle center of mass, R is the turning radius, v is the centroid velocity, ω is the yaw rate, B is the track gauge, L is the track-grounding length.

Assuming a negligible change in β, which is valid for low to medium speed trajectory tracking and moderate curvature, where the sideslip angle β varies slowly and remains within a small range. In such cases, β can be treated as a bounded disturbance in the kinematic error dynamics. For higher-speed or more aggressive steering maneuvers, β is estimated online and explicitly incorporated into the dynamic prediction model as a measurable disturbance. A kinematics model of the tracked vehicle can be constructed as follows:

\begin{matrix} [\begin{matrix} \overset{\cdot}{X} \\ \overset{\cdot}{Y} \\ \overset{\cdot}{φ} \end{matrix}] = [\begin{matrix} \cos (φ + β) & 0 \\ \sin (φ + β) & 0 \\ 0 & 1 \end{matrix}] [\begin{matrix} v \\ ω \end{matrix}] \\ = [\begin{matrix} \frac{\cos (φ + β)}{2 \cos (β)} & \frac{\cos (φ + β)}{2 \cos (β)} \\ \frac{\sin (φ + β)}{2 \cos (β)} & \frac{\sin (φ + β)}{2 \cos (β)} \\ \frac{1}{B} & - \frac{1}{B} \end{matrix}] [\begin{matrix} v_{R} \\ v_{L} \end{matrix}] \end{matrix}

(1)

When analyzing the steering motion of the tracked vehicle, it is necessary to consider the sliding phenomenon of the tracks. The slip rate of the track can be expressed as follows:

{\begin{cases} s_{1} = \frac{ω_{1} r - v_{1}}{ω_{1} r} \times 100 % \\ s_{2} = \frac{v_{2} - ω_{2} r}{ω_{2} r} \times 100 % \\ s_{1} \in (0, 1), s_{2} \in (0, + \infty) \end{cases}

(2)

Where s₁ and s₂ represent the slip rates of the outer and inner tracks, ω₁ and ω₂ represent the angular velocities of the outer and inner drive wheels, and v₁ and v₂ represent the axial linear speeds of the outer and inner wheels, respectively.

So the relationship between v₁, v₂ and ω₁, ω₂ is:

{\begin{matrix} v_{1} = (1 - s_{1}) ω_{1} r \\ v_{2} = (1 + s_{2}) ω_{2} r \end{matrix}

(3)

If $\tilde{s} = \frac{v}{ω r} - 1$ is defined, then equation (3) can be uniformly expressed as:

v = (1 + \tilde{s}) ω r

(4)

By substituting the equation (3) into the equation (4), the following expression can be obtained as followed:

[\begin{matrix} \overset{\cdot}{x} \\ \overset{\cdot}{y} \\ \overset{\cdot}{φ} \end{matrix}] = [\begin{matrix} \frac{\cos (φ + β)}{2 \cos (β)} & \frac{\cos (φ + β)}{2 \cos (β)} \\ \frac{\sin (φ + β)}{2 \cos (β)} & \frac{\sin (φ + β)}{2 \cos (β)} \\ \frac{1}{B} & - \frac{1}{B} \end{matrix}] [\begin{matrix} (1 + {\tilde{s}}_{R}) r ω_{R} \\ (1 + {\tilde{s}}_{L}) r ω_{L} \end{matrix}]

(5)

Where the sliding parameters can be estimated and obtained from the state feedback $\tilde{s}$ . In our implementation, an extended Kalman filter (EKF) referencing the standard formulation in Rajamani²⁷ is used to estimate β and slip-related states from measured longitudinal speed, yaw rate and left/right motor angular speeds; the EKF outputs are then supplied to the dynamic prediction model and MPC constraints. The relationship between the winding speed of the tracks and the speed of the drive motors on both sides is expressed as follows:

ω_{i} = \frac{2 π n_{i}}{60000 i} = 0.377 \frac{n_{i}}{i}

(6)

Where, i is the reduction ratio from the motor to the driving wheel.

Dynamics analysis during straight-line driving

The force balance equation for the straight-line traveling of the tracked vehicle is as follows:

\begin{matrix} F_{t} = F_{f} + F_{i} + F_{w} + F_{j} \\ = fG + G \sin α + \frac{C_{D} A}{21.15} v^{2} + δ M \frac{dv}{dt} \end{matrix}

(7)

Where G is the total vehicle weight (N), M is the vehicle mass (kg), F_t is the required traction force of the entire vehicle (N), $F_{f}$ is the driving resistance of the tracked vehicle (N), f is the road resistance coefficient, F_i is the gradient resistance (N), F_w is the air resistance (N), C_D is the air drag coefficient, A is the front windward area of the vehicle $(m^{2})$ , F_j is the acceleration resistance $(N), δ$ is the coefficient of the revolving mass.

The F_t required for the entire vehicle is limited by the output torque $[T_{\min}, T_{\max}]$ of the drive motors on both sides of the tracked vehicle. It is known that the reduction ratio of the reducer is i, the transmission efficiency is η, and the radius of the driving wheel is r. Then the following expression can be obtained as followed:

F_{t} = \frac{Ti η}{r}

(8)

Dynamic analysis during steering

By adjusting the driving torque of the two motors on both sides to regulate the speed of the tracks on both sides, the tracked vehicle can achieve steering motion. The drive motor can operate in four quadrants, and the typical steering conditions for the dual motor independent drive tracked vehicle are shown in Figure 3.

Figure 3.

The typical steering conditions for the dual motor independent drive tracked vehicle and the operating characteristics of the four quadrants of the motor.

As shown in Figure 4, the steering speed of the tracks on both sides is different, and the direction of the longitudinal friction force F_fi is opposite to the direction of the track speed, namely:

{\begin{matrix} F_{fi} = sign (V_{i}) F_{f} \\ F_{f} = \frac{μ mg}{2} \end{matrix}

(9)

Where, μ is the friction resistance coefficient between the track and the ground. sign is a Sign function:

{\begin{matrix} sign (V_{i}) = 1, V_{i} > 0 \\ sign (V_{i}) = 0, V_{i} = 0 \\ sign (V_{i}) = - 1, V_{i} < 0 \end{matrix}

Figure 4.

Schematic diagram of steering force analysis for the tracked vehicle: (a) turning at low speed and (b) turning at high speed.

According to the normal load per unit length on the grounding area of the track, the lateral friction resistance of the corresponding grounding area track can be obtained.

The friction coefficient between the track and the ground is μ, which is approximately equal to the friction coefficient between the track shoe and the ground. The normal load per unit length on the connecting section is p. d is the distance at which the center of mass shifts during high-speed turning. The lateral frictional resistance of the front and rear sections of the track during steering can be expressed as:

{\begin{matrix} F_{yf 1} = F_{yf 2} = \int_{0}^{\frac{L}{2} - d} μ pdx = \frac{μ G}{2 L} (\frac{L}{2} - d) \\ F_{yr 1} = F_{yr 2} = \int_{0}^{\frac{L}{2} + d} μ pdx = \frac{μ G}{2 L} (\frac{L}{2} + d) \end{matrix}

(10)

The steering resistance moment M_y caused by the lateral frictional resistance is:

M_{y} = \frac{μ GL}{4} [1 + {(\frac{2 d}{L})}^{2}]

(11)

The resistance moment M_f caused by longitudinal friction is expressed as:

{\begin{matrix} M_{f} = τ (\cdot) \cdot B F_{f} \\ τ (\cdot) = \frac{sign (V_{R}) - sign (V_{L})}{2} \end{matrix}

(12)

Neglecting centrifugal force, the total steering resistance moment of tracked vehicles during steering M_zu is:

M_{zu} = \frac{μ GL}{4} + τ (\cdot) B F_{f}

(13)

The constraint expression for the resistance coefficient is as follows:

μ_{min} < μ < μ_{max}

(14)

According to Newton’s Second Law, the lateral, longitudinal, and yaw movements during track steering can be expressed as:

{\begin{matrix} m (a_{x} - v_{x} ω \tan β) = F_{L} + F_{R} - 2 σ (\cdot) F_{f} \\ m (a_{y} + v_{x} ω) = - F_{yf 1} - F_{yf 1} + F_{yr 1} + F_{yr 2} \\ J \frac{d ω}{dt} = (F_{R} - F_{L}) \cdot \frac{B}{2} - M_{zu} \end{matrix}

(15)

Where, $σ (\cdot) = \frac{sgn (V_{L}) + sgn (V_{R})}{2}$

f is defined as the coefficient of vehicle driving resistance. By substituting the equations (9), (10), and (13) into the equation (15), the following expression can be obtained:

{\begin{matrix} {\overset{\cdot}{v}}_{x} = \frac{F_{L} + F_{R}}{m} - σ (\cdot) fg + v_{x} ω \tan β \\ {\overset{\cdot}{v}}_{y} = - v_{x} ω \\ \overset{\cdot}{ω} = \frac{1}{J} (F_{R} - F_{L}) \cdot \frac{B}{2} - \frac{μ_{x} GL}{4} + τ (\cdot) B F_{f} \end{matrix}

(16)

Preview trajectory tracking MPC control strategy

Model predictive control (MPC)

The design of a model predictive controller includes three parts: establishing a predictive model, constructing a objective function and obtaining the optimal control quantities. The design process is shown in Figure 5.

Figure 5.

MPC design framework.

In the Figure 5, y(t) is the actual output of the controlled object at the current time, T is the sampling period of the MPC controller, N_p is the predictive time domain, N_c is the control time domain, $Y_{ref} (t)$ is the output reference value of the system, $U (t) = [u (t | t), u (t + 1 | t), \dots u (t + N_{c} - 1 | t)]^{T}$ is the optimal control variables in N_c, and $Y (t) = [y (t + 1 | t)^{T} \dots y (t + N_{p} | t)^{T}]^{T}$ is the predicted output variables in N_p.

According to the feedback current state estimation value $x^{*} (t)$ , the disturbance variables w(t), U(t) and Y(t) are obtained through the prediction model. Combined with $Y_{ref} (t)$ , The objective function is constructed, the constraint conditions are designed and the optimization solution is being carried out. The first optimized control quantity $u (t + 1 | t + 1)$ of $U (t + 1)$ is input into the controlled system, and then the next time step operation for optimization solution begins. By repeating the above process, rolling optimization, and feedback control can be achieved.

The MPC based on kinematics of tracked vehicles

The establishment and discretization of prediction models

According to Figure 2, the preview characteristics of the driver are first considered. The preview distance constant is set to L_d, and the reference trajectory P_ref is founded. Its coordinates in its global coordinate system are $(X_{ref}, Y_{ref})$ , and the expected longitudinal velocity at this point P_ref is v_d. The curvature and tangential angle of the reference trajectory at this point P_ref are ρ and φ_ref, respectively.

From the Figure 2, the state equation of the error model is obtained as follows:

{\begin{matrix} {\overset{\cdot}{y}}_{e} = v_{x} φ_{e} - ω L_{d} \\ {\overset{\cdot}{φ}}_{e} = v_{x} ρ - ω \end{matrix}

(17)

The lateral deviation y_e and the heading angle deviation φ_e are selected as the state variables, with the yaw rate ω as the control variable. v_x is the actual longitudinal speed at the current moment, the curvature of the reference point ρ is selected as the measurable disturbance, y_e and φ_e are selected as the output variables. The equation (17) is written in the form of state space and expressed as follows:

{\begin{matrix} \overset{\cdot}{x} = Ax + Bu + Dw \\ y = Cx \end{matrix}

(18)

Where $x = {[y_{e}, φ_{e}]}^{T}, u = ω, w = ρ$ , the coefficient matrix in equation (18) are set as follows:

\begin{matrix} A = [\begin{matrix} 0 & v_{x} \\ 0 & 0 \end{matrix}], B = [\begin{matrix} - L_{d} \\ - 1 \end{matrix}]; \\ D = [\begin{matrix} 0 \\ v_{x} \end{matrix}], C = [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}] . \end{matrix}

The continuous state space equation of the equation (17) is discretized and represented as follows:

{\begin{matrix} x (k + 1) = A_{kt} x (k) + B_{kt} u (k) + D_{kt} w (k) \\ y (k) = C_{kt} x (k) \\ u (k) = u (k - 1) + Δ u (k) \end{matrix}

(19)

Where, $A_{kt} = e^{AT}$ , $B_{kt} = \int_{0}^{T} e^{AT} d τ \cdot B$ , $D_{kt} = \int_{0}^{T} e^{AT} d τ \cdot D$ , $C_{kt} = C$ .

Furthermore, in order to improve computational efficiency, the above parameters are simplified:

{\begin{matrix} A_{kt} = A_{tt}, k = 1, 2, \dots, t + N_{p} - 1 \\ B_{kt} = B_{tt}, k = 1, 2, \dots, t + N_{p} - 1 \end{matrix}

(20)

In summary, the output equation of the entire discrete system at time t is:

Y (t) = \tilde{A} x (t) + {\tilde{B}}_{u} u (t - 1) + {\tilde{B}}_{Δ U} Δ U (t) + \tilde{D} W (t)

Where,

\begin{matrix} Y (t) = [y (t + 1 | t)^{T}, \dots, y (t + N_{P} | t)^{T}]^{T} \\ Δ U (t) = [Δ u (t | t)^{T}, \dots, Δ u (t + N_{C} - 1 | t)^{T}]^{T}, \\ W (t) = [w (t | t)^{T}, \dots, w (t + N_{P} - 1 | t)^{T}]^{T}, \\ \tilde{A} = [A_{kt}, A_{kt}^{2}, \dots, A_{kt}^{N_{C}}, \dots, A_{kt}^{N_{P}}]^{T}, \\ \tilde{B} = [B_{kt}, \dots, \sum_{i = 0}^{N_{C}} A_{kt}^{i} B_{kt}, \dots, \sum_{i = 0}^{N_{P} - 1} A_{kt}^{i} B_{kt}]^{T}, \end{matrix}

\begin{matrix} {\tilde{B}}_{Δ U} = [\begin{matrix} B_{kt} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ \sum_{i = 0}^{N_{C} - 1} A_{kt}^{i} B_{kt} & \dots & A_{kt} B_{kt} + B_{kt} \\ ⋮ & ⋮ \\ \sum_{i = 0}^{N_{P} - 1} A_{kt}^{i} B_{kt} & \dots & \sum_{i = 0}^{N_{P} - N_{C}} A_{kt}^{i} B_{kt} \end{matrix}], \\ \tilde{D} = [\begin{matrix} D_{kt} & 0 & \dots & 0 \\ A_{kt} D_{kt} & D_{kt} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & 0 \\ A_{kt}^{N_{P} - 1} D_{kt} & A_{kt}^{N_{P} - 2} D_{kt} & \dots & D_{kt} \end{matrix}] . \end{matrix}

When $N_{C} < t \leq N_{p}$ , the change in output is 0, $u (t + N_{C} + j | t) = u (t + N_{C} | t)$ .

The selection of N_P has a great impact on the accuracy of trajectory tracking control, and N_C is directly related to the dimension of the optimal control sequence, which affects the calculation efficiency. In this paper, according to the equations (21) and (22), N_P is selected as 15 and N_C is selected as 3, which have been also adjusted by simulation test.

\frac{p_{a} t_{F}}{T} \leq N_{P} \leq \frac{t_{F}}{T}

(21)

According to different controlled objects, P_a is generally set as 1/4–1/3, t_F is the system dynamics response time.

N_{C} \in {0.1 N_{P} \leq N_{C} \leq 0.2 N_{P}} \cap {3 \leq N_{C}}

(22)

Objective function and constraints

Firstly, in order to improve the trajectory tracking accuracy of the controller, it is desired that the lateral deviation and heading angle deviation of the controlled vehicle tend to zero. The objective function is set as follows:

J_{1} = \sum_{i = 1}^{N_{P}} [| | y_{e} (t + i | t) - y_{e_ref} (t + i | t) | |_{Q_{y_{e}}}^{2} + | | φ_{e} (t + i | t) - φ_{e_ref} (t + i | t) | |_{Q_{φ_{e}}}^{2}]

(23)

Where, $y_{e_ref} = 0, φ_{e_ref} = 0$ , $Q_{y_{e}}$ and $Q_{φ_{e}}$ are the weight values used to describe the lateral deviation and the heading angle deviation, respectively.

The equation (13) is written in matrix form as follow:

J_{1} = | | Y (t) - Y_{ref} (t) | |_{Q}^{2}

(24)

Where, $Y = [y_{e}, φ_{e}]^{T}$ , $Y = [y_{e_ref}, φ_{e_ref}]^{T}$ ,

Q = [Q_{y_{e}}, Q_{φ_{e}}]^{T} .

In addition, it is usually necessary to take into account the rate of change of the control variable, and the second term of the objective function is set to:

\underset{Δ U}{min J_{2}} = ‖ Δ U (t) ‖_{R}^{2}

(25)

Where, R is the weight matrix of the MPC controller control term. J₂ is generally used to describe the comfort level of riding, but the rate of change of the control variable can also affect the control effect to some extent. For example, an excessively large rate of change of the control variable can easily lead to overshoot during trajectory tracking.

Combining J₁ and J₂, the objective function of the model predictive controller is designed as follows:

\underset{Δ U}{min J} = ‖ Y (t) - Y_{ref} (t) ‖_{Q}^{2} + ‖ Δ U (t) ‖_{R}^{2} + σ ε^{2}

(26)

Where, $Y_{ref} (t) = [y_{ref} (t + 1 | t)^{T}, \dots, y_{ref} (t + N_{P} | t)^{T}]^{T}$ is the expected output (i.e. expected lateral deviation, expected heading deviation), δ is the weight coefficient, and ε is the relaxation factor.

This objective function combines three tasks. The first task is to ensure the tracking accuracy of the predicted output reference value, that is, the desired path. The second task is to ensure that the change in control increment is as small as possible to ensure the smooth turning behavior of the vehicle. The third task is a relaxation term.

Due to the lack of consideration of dynamics factors and motor characteristics in the kinematics model, only soft constraints are applied to state variables and control variables:

y_{e_min} \leq y_{e} \leq y_{e_max}

(27)

φ_{e_min} \leq φ_{e} \leq φ_{e_max}

(28)

ω_{min} \leq ω \leq ω_{max}

(29)

Where, the lateral deviation at each moment shall not be too large, which is taken as $| y_{e_\min} | = | y_{e_\min} | = 1.5 m$ , and the heading angle deviation shall also be kept in a small range to meet the small angle hypothesis, $| φ_{e_\min} | = | φ_{e_\min} | = 0.52 rad$ . The extreme value of yaw rate is determined by the current vehicle speed, the curvature of the reference point, and the performance of the controlled vehicle, namely $| ω_{\min} | = | ω_{\max} | = | v_{x} ρ |$ .

The optimal solution

The objective function is solved by the quadratic programming algorithm, and the equation (26) is transformed into the following standard quadratic function as followed:

J = \frac{1}{2} {[\begin{matrix} Δ U \\ ε \end{matrix}]}^{T} Θ [\begin{matrix} Δ U \\ ε \end{matrix}] + Ξ [\begin{matrix} Δ U \\ ε \end{matrix}]

(30)

From the equation (30), the optimal control increment sequence $Δ U (t)$ is obtained, and the sum of the first item of the sequence and the current control variable is selected as the next long control variable. By using the multi-objective optimization with constraints, the optimal control sequence can be obtained by rolling optimization in real time.

The MPC based on dynamics of tracked vehicles

The establishment and discretization of prediction models

The kinematic preview model in equation (17) provides the path-related error states, including the lateral deviation y_e and the heading deviation φ_e, whereas the dynamic model in equation (16) captures the coupled longitudinal–lateral dynamics through the longitudinal velocity v_x and the yaw rate ω. In the proposed controller, these four quantities are stacked into an augmented state vector.

Based on the kinematics analysis of the tracked vehicle steering in Section 2, the prediction equation based on dynamics is constructed as follows:

{\begin{matrix} {\overset{\cdot}{y}}_{e} = v_{x} φ_{e} + v_{x} \tilde{β} - ω L_{d} \\ {\overset{\cdot}{φ}}_{e} = v_{x} ρ - ω \\ \overset{\cdot}{ω} = \frac{i η B}{2 rJ} (T_{R} - T_{L}) - \frac{M_{z} + τ (\cdot) B F_{f}}{J} \\ {\overset{\cdot}{v}}_{x} = \frac{i η}{r} (T_{R} + T_{L}) + v_{x} ω \tilde{β} + \frac{2 σ (\cdot) F_{f}}{m} \end{matrix}

(31)

Where the y_e, φ_e, ω, and v_x are selected as the state variables, the driving torque $[T_{R}, T_{L}]$ of the motors on both sides is taken as the control variable, $σ (\cdot), τ (\cdot)$ , the curvature ρ at the reference point and the real-time estimated sideslip angle $\tilde{β}$ of the center of mass are taken as the measurable disturbance variables, and y_e, φ_e, and v_x are selected as the output variables. The nonlinear dynamics state space equation for trajectory tracking control is established as follows:

{\begin{matrix} \overset{\cdot}{ξ} = f (ξ, u, d_{_m}) \\ η = h (ξ) \end{matrix}

(32)

Where, $ξ = {[y_{e}, φ_{e}, ω, v_{x}]}^{T}$ , $u = [\begin{matrix} T_{R} & T_{L} \end{matrix}]^{T}$ , $d_{_m} = {[σ, τ, ρ, \tilde{β}]}^{T}$ , $η = {[y_{e}, φ_{e}, v_{x}]}^{T}$ .

Since the dynamic model (equation (32)) is nonlinear, a successive linearization strategy is adopted. At each sampling instant, the model is linearized around the current state and input, $(x_{k}, u_{k})$ using Jacobian matrices.

A_{k} = {\frac{\partial f (x, u)}{\partial x} |}_{x_{k}, u_{k}}, B_{k} = {\frac{\partial f (x, u)}{\partial u} |}_{x_{k}, u_{k}}

(33)

The resulting local linear time-varying (LTV) model is then discretized via the backward difference method and used as the prediction model in the MPC formulation.

Objective function construction and constraints

Due to the use of a transverse and longitudinal coupling control strategy, the tracking target variables and predicted output variables not only contain lateral deviation and heading angle deviation, but also longitudinal velocity, that is, $η = {[y_{e}, φ_{e}, v_{x}]}^{T}$ . Therefore, the corresponding reference value is $η_{ref} = {[y_{er}, φ_{er}, v_{xd}]}^{T}$ , and the reference value in each predicted time domain is $Y_{ref} (t) = [η_{ref} (t + 1 | t)^{T}, \dots, η_{ref} (t + N_{P} | t)^{T}]^{T}$ .

The objective function of the trajectory tracking control based on dynamics prediction model is:

\underset{Δ U}{min J} = ‖ Y (t) - Y_{ref} (t) ‖_{Q}^{2} + ‖ Δ U (t) ‖_{R}^{2} + σ ε^{2}

(34)

Where, $Q = [Q_{y_{e}}, Q_{φ_{e}}, Q_{v_{e}}]$ is the weight coefficient. It can also be written in the following form:

\begin{matrix} \underset{Δ U}{min J} = \sum_{k = t}^{t + N_{P}} ‖ y_{e} (k) - y_{er} (k) ‖_{Q_{y_{e}}}^{2} + ‖ φ_{e} (k) - φ_{er} (k) ‖_{Q_{φ_{e}}}^{2} \\ + ‖ v_{x} (k) - v_{xd} (k) ‖_{Q_{v_{x}}}^{2} + ‖ Δ U (t) ‖_{R}^{2} + σ ε^{2} \end{matrix}

(35)

Next, the constraints of the trajectory tracking MPC controller are established and include output constraints, control increment constraints, and state variable constraints. The constraints on state variables and output variables are similar to equations (25) to (27).

The constraints on control variables need to be combined with the mechanical structure of the tracked vehicle based on motor characteristics, and the driving torque $[T_{R}, T_{L}]$ needs to meet the following requirements:

{\begin{matrix} T_{R min} \leq T_{R} \leq T_{R max} \\ T_{L min} \leq T_{L} \leq T_{L max} \end{matrix}

(36)

Where, $i \in {R, L}$ and T_imax should not exceed the peak driving torque T_m calibrated by the motor, but also meet the power constraints of the motor, namely:

T_{i} \leq 9549 \frac{P_{m}}{n_{i}}

(37)

So, ${| T_{i} |}_{max} = min (T_{m}, 9549 \frac{P_{m}}{n_{i}})$ .

Taking all the constraints established above into account, the equation (34) is simplified into the standard form of Quadratic programming:

{\begin{matrix} min_{Δ U \in R^{m}} J (Δ U) = \frac{1}{2} Δ U^{T} Θ Δ U + Ψ Δ U \\ s . t . y_{e_min} \leq y_{e} \leq y_{e_max}, \\ φ_{e_min} \leq φ_{e} \leq φ_{e_max}, \\ ω_{min} \leq ω \leq ω_{max}, \\ | T_{R} | \leq {| T_{R} |}_{max}, | T_{L} | \leq {| T_{L} |}_{max} . \end{matrix}

(38)

Finally, the solution of the optimal output torque increment sequence is completed. The first term is taken and added to the control variable at t−1 to obtain the optimal control variable at t, which is then output to the controlled vehicle for control.

The optimal control quantity can be determined by minimizing the objective function (equation (37)) while satisfying the constraints. This process can be transformed into a Convex optimization problem under the constraints and solved by the Quadratic programming method.

The final control strategy structure block diagram proposed is shown in Figure 6.

Figure 6.

MPC-based trajectory tracking control framework with DQN-based weight adaptation.

Adaptive adjustment method of objective function weight matrix based on DQN

The Q-learning algorithm is a classic algorithm in traditional reinforcement learning. By continuously interacting with the environment, the agents learn better behavioral strategies and gradually update and improve the state action table (Q-table). The value of the table represents the maximum expected reward that can be obtained when executing an action in a certain state. However, when the environment dimension of the agent is high and there are many states, the continued use of Q-table to store the state action value function will cause a “Curse of dimensionality.”

To address this issue, the DeepMind team have integrated the deep neural networks with the Q-learning algorithm and proposed the deep Q network (DQN) algorithm. The DQN algorithm is an improvement on the Q-learning algorithm, which utilizes a deep neural network to dynamically generate a Q-value table and approximates the state action value function by continuously iterating and updating the parameters θ of the neural network f. The goal of parameter learning in the DQN algorithm mentioned above depends on the parameters themselves, which also leads to the Q network becoming more inclined towards actions with originally high Q values, resulting in the instability of the algorithm. To solve this problem, the double deep Q network (DDQN) can add a target Q network on top of the original DQN algorithm. The DDQN algorithm process is shown in Figure 7.

Figure 7.

DDQN algorithm training process with experience playback area.

The objective of this study is to calculate the optimal control quantity based on the feedback information from tracked vehicles and the reference trajectory and speed established in the previous planning layer, in order to obtain the optimal tracking accuracy. The solution of the optimal control sequence in MPC strategy is transformed into a quadratic programming problem.

Therefore, the task of solving the optimal control quantity within each time step can be divided into two small subtasks:

Determine the optimal weight matrix $[Q_{y_{e}}, Q_{φ_{e}}, Q_{v_{e}}, R]$ based on vehicle status and reference trajectory.

According to a fixed weight matrix, the quadratic programming problem is solved.

Based on the DQN algorithm, the adaptive adjustment of the weight matrix of MPC is achieved. At each calculation time step, the current optimal value is given, and the online update of the Q network is synchronously completed.

The specific steps of the adaptive adjustment method for the weights of MPC based on DQN are as follows:

Step 1: The tracked unmanned vehicle is regarded as an agent, and the external conditions of the tracked vehicle and some body characteristics (such as track material, mechanical structure, etc.) are regarded as environment. The action space, the state space, the state-transition equation and the reward function in the DQN algorithm are defined.

1. State space

The state of the tracked unmanned vehicle at time t is defined, including the current lateral deviation, heading angle deviation, speed error, and the original weight matrix:

s_{t} = [y_{e} (t), φ_{e} (t), v_{e} (t), W_{t - 1}]

(39)

2. Action space

The actions within each time step are adjustments to the weight, therefore the action space is defined as:

a_{t} = Δ W_{t} = [Δ Q_{y_{e}} (t), Δ Q_{φ_{e}} (t), Δ Q_{v_{e}} (t), Δ R (t)]

(40)

Where $Δ Q_{y_{e}} (t), Δ Q_{φ_{e}} (t), Δ Q_{v_{e}} (t), Δ R (t)$ corresponds to the variation of the four weight coefficients of the objective function. In order to improve the efficiency of training and ensure the stability of vehicle driving when using the weight matrix for control, each adjustment of the weight should not be too large. Therefore, it is considered to reduce the dimension of the action space, so that $Δ Q_{y_{e}} (t)$ $Δ Q_{φ_{e}} (t)$ $Δ Q_{v_{e}} (t)$ and $Δ R (t)$ are both valued in $[- 0.1, 0, + 0.1]$ . Therefore, for each state space, there are $nu m_{a} = 3^{4} = 81$ adjustment actions corresponding to it. Based on the ε Greedy algorithm, the next action is selected as followed:

π (a | s) = {\begin{matrix} ε, a \neq \arg max Q_{ϕ} (s, a), \\ 1 - ε, a = \arg max Q_{ϕ} (s, a) \end{matrix}

(41)

Among them, $π (a | s)$ has a probability of ε for random “exploration” in the s state, and a probability of $1 - ε$ to output the action with the highest Q value based on the current Q network.

3. State-transition equation

As an action $a_{t} = Δ W_{t} = [Δ Q_{y_{e}} (t), Δ Q_{φ_{e}} (t), Δ Q_{v_{e}} (t), Δ R (t)]$ is selected, which means that the weight has been adjusted, and the determined target weight is used for the quadratic programming solution as followed:

{\begin{matrix} W_{t} = W_{t - 1} + Δ W_{t} \\ Δ U (t) : \arg max J_{Δ U} \leftarrow W_{t} \end{matrix}

(42)

Subsequently, the optimal motor driving torque is obtained and applied on the tracked unmanned vehicle, causing a transition in the transition space as followed:

{\begin{matrix} [T_{R}, T_{L}] = u (t) = u (t - 1) + Δ U (t) \\ s_{t + 1} = f_{real} (s_{t}, u (t)) \end{matrix}

(43)

where f_real represents the feedback state of the actual tracked unmanned vehicle under the given driving torque.

4. Reward function

The ultimate goal of weight adaptive adjustment is to achieve optimal tracking accuracy, so a negative root mean square error is designed as a reward:

r_{t} = - \sqrt{y_{e}^{2} + φ_{e}^{2}}

(44)

Step 2: Set up the Q network and target Q network architecture. The two network architectures are the same, except that the update frequency of internal parameters is different. The hidden layer of Q network is designed as a feedforward neural network consisting of two full connection layers. The number of units is 16 and 8, respectively. The activation function uses the ReLU function. Where, the input layer dimension is determined to be 7 based on s, and the output layer dimension is 1. Assuming that the network parameter of the Q network at time t is Q_t and the network parameter of the target Q network is $Q_{t}^{'}$ . Its value is initialized.

Step 3: Establish a state buffer $set {transition} = {[s_{t}, a_{t}, r_{t}, s_{t + 1}]}$ and randomly extract a training dataset to train and update the Q network parameters.

The Q value under Q network is defined as $Q (s, a | θ)$ , and the gradient descent method is used to update the parameters of Q network.

θ_{t + 1} = θ_{t} - α \nabla_{θ_{t}} L (θ_{t})

(45)

Where, α is the learning rate, and controls the size of each parameter update. Usually, we use a smaller α (such as 10⁻⁴ or 10⁻⁵), and gradually reduce α during the training process to balance the contradiction between the learning speed and the stability requirements. $\nabla_{θ_{t}} L (θ_{t})$ is the gradient of the loss function $L (θ_{t})$ relative to parameter θ_i at time t, and the loss function can be defined as the mean squared error of Q value of the prediction network and the target network according to the equation (44):

\begin{matrix} L (θ_{t}) = \\ E [{(r_{t} + γ {max}_{a_{t + 1}} Q (s_{t + 1}, a_{t + 1} | θ_{t}^{'}) - Q (s_{t}, a_{t} | θ_{t}))}^{2}] \end{matrix}

(46)

The above steps have completed the update of the Q network at time t. After each k time steps, the target Q network is updated once as followed:

θ' = θ

(47)

The pseudocode of the algorithm for adaptive adjustment of MPC weights based on DQN is shown in Algorithm 1.

Algorithm 1. MPC weight adaptive adjustment algorithm based on DQN
Input: state space S, action space A, discount rate γ, Learning rate α, update target network steps k, exploration rate ε
1: Initialize the experience playback area M and set the capacity to N
2: Initialize Q network parameters θ
3: Synchronize target Q network parameters $θ^{'} = θ$
4: for episode=1:m:
5: Initialization status $s = [y_{e}, φ_{e}, v_{e}, W]$
6: for time = 1: n
7: Select action a in state s
8: Adjust the original weights: $W^{'} = W + Δ W \|_{a}$ . The MPC controller uses weight W_t to solve the torque output to the tracked unmanned vehicle and senses changes in state variables, collecting relevant information
9: Calculate the reward value
10: Obtain new environmental status s′
11: Store status transfer information $(s, a, r, s')$ in M
12: Collect $(s, a, r, s')$ from M
13: $y = {\begin{matrix} r, if s^{'} i s th e en d o f the sequence \\ r + γ {max}_{a'} Q_{θ'} (s', a'), ortherwise \end{matrix}$ .
14: Training Q network with Loss function $(y - Q_{θ} (s, a))$
15: Update target Q network parameters $θ^{'} \leftarrow θ$ every k steps
17: end
18: end
Output: MPC optimal weight adaptive adjustment strategy

Simulation and experiment

MATLAB/Simulink–RecurDyn joint simulation

This dual motor driven tracked vehicle is a typical multivariable and nonlinear system, and a numerical analytical mathematical model is difficult to describe the actual dynamics behavior of the vehicle. RecurDyn multi-body simulation software is used to establish a 3D virtual model of the tracked vehicle and a road surface model. The dual motor driven tracked vehicle model in RecurDyn is shown in the Figure 8. The dynamics parameters of the tracked vehicle are shown in Table 1. The trajectory tracking control strategy model shown in the figure is built in MATLAB/Simulink. Through collaborative simulation using RecurDyn and MATLAB/Simulink, the trajectory tracking control considering actual vehicle dynamics and operational behavior is achieved, as shown in Figure 9. The inputs of the RecurDyn module are the torques of the active wheels on both sides, and the outputs include the speeds of the active wheels, the vehicle speed, the trajectory of vehicle center, the yaw rate, the longitudinal speed of the track, etc.

Figure 8.

The tracked vehicle model in RecurDyn.

Table 1.

Dynamic model parameters of tracked vehicle dynamics based on RecurDyn.

Components	Number	Moment of inertia (kg/m)	Stiffness	Damping coefficient
Capstan	2	279,379	18,000	10
Road wheel	10	685,898	12,000	10
Towing wheel	6	5121	15,000	10
Track shoe	3.51989	1989	9000	10

Figure 9.

MATLAB/Simulink–RecurDyn joint simulation platform.

Simulation results

In order to verify the advantages of the proposed trajectory tracking control strategy based on the dynamics prediction model compared with the traditional control strategy based on kinematics in terms of control accuracy and stability, a double lane trajectory tracking test is designed under the conditions of vehicle speeds of $18 and 36 km / h$ , respectively, and high road adhesion coefficient μ=0.85 (Figure 10). To verify the effectiveness of the proposed DQN based weight adaptive adjustment method, a tracking condition was designed at $v = 36 km / h$ .

1. ν=18km/h, μ=0.85

The screenshot of the visualized trajectory dynamic tracking simulation in RecurDyn is shown in Figure 11, and the simulation results of double shift lane tracking are shown in Figure 12.

Figure 10.

Screenshot of MPC trajectory tracking visualization simulation based on dynamics prediction model, $v = 18 km / h$ .

Figure 11.

Screenshot of the MPC trajectory tracking visualization simulation based on dynamics prediction model, $v = 36 km / h$ .

Figure 12.

Comparison of simulation results of the MPC control based on kinematics/dynamics prediction model at $v = 36 km / h$ : (a) trajectory, (b) lateral position deviation, (c) heading angle deviation, (d) yaw rate and (e) vehicle speed.

From Figure 13(a) to (c), it can be seen that there is a significant lateral deviation and heading angle deviation when the tracked vehicle turns and changes lanes at times of t= 9.5s, t= 13.5s, and t= 20s. The maximum lateral deviation of the MPC control strategy based on kinematics prediction model is 0.89m, and the maximum heading angle deviation is 0.13rad. The maximum lateral deviation of the control strategy based on the dynamics prediction model is 0.21m, and the maximum heading angle deviation is $0.02 rad$ . Compared with the conventional benchmark kinematic MPC controller, the maximum lateral deviation and heading angle deviation of the MPC using the proposed dynamics prediction model are reduced by 76.4% and 84.6%, respectively. This proves that tracked vehicles using the proposed MPC tracking control strategy have better tracking performance during low-speed steering.

Figure 13.

Comparison of simulation results of the MPC control based on kinematics/dynamics prediction model at $v = 18 km / h$ : (a) trajectory, (b) lateral position deviation, (c) heading angle deviation, (d) yaw rate and (e) vehicle speed.

Figure 13(d) shows that the control strategy based on the dynamics prediction model has a smaller yaw rate, and the controlled tracked unmanned vehicle turns more smoothly.

From Figure 13(e), it can be seen that due to the use of lateral and longitudinal decoupling methods in the kinematics control strategy, the deviation between the output speed and the expected vehicle speed is relatively small. The dynamics based MPC control strategy considers lateral and longitudinal coupling. When the lateral deviation and heading angle deviation are large, multi-objective optimization is used to balance and adjust the longitudinal speed in real time. Therefore, the longitudinal speed will be slightly reduced when the vehicle turns.

2. $v = 36 km / h, μ = 0.85$

The screenshot of the visualized trajectory dynamic tracking simulation in RecurDyn is shown in Figure 11, and the simulation results of double shift lane tracking are shown in Figure 12.

From Figure 12(a) to (c), it can be seen that there is a significant lateral deviation and heading angle deviation when the tracked vehicle turns and changes lanes at times $t = 5.2 s$ , $t = 9.3 s$ , and $t = 13 s$ . The maximum lateral deviation of the MPC control strategy based on kinematics prediction model is $2.5 m$ , and the maximum heading angle deviation is $0.26 rad$ . The maximum lateral deviation of the control strategy based on the dynamics prediction model is $1.04 m$ , and the maximum heading angle deviation is $0.21 rad$ . Although the lateral deviation and heading angle deviation increase at v = 36 km/h, compared with the conventional benchmark kinematic MPC controller, the maximum lateral deviation and heading angle deviation of the MPC using the proposed dynamics prediction model are reduced by 58.4% and 18.6%, respectively. This proves that the MPC trajectory tracking control strategy based on the dynamics prediction model can effectively improve the trajectory tracking accuracy in the middle and low speed steering process.

As shown in Figure 12(d), the yaw rate of the trajectory tracking strategy based on the kinematics prediction model exceeds the limit of $0.2 rad / s$ up to $0.33 rad / s$ , resulting in a decrease in driving smoothness. Due to the consideration of dynamics factors in MPC trajectory tracking control based on dynamics models, the steering becomes smoother.

As shown in Figure 12(e), the proposed MPC trajectory tracking control strategy based on the dynamics prediction model can ensure tracking accuracy by appropriately adjusting the vehicle speed through multi-objective optimization algorithms.

2. Adaptive weight adjustment based on DQN: $v = 36 km / h, μ = 0.85$

In order to validate the proposed DQN based MPC weight adaptive adjustment method, under the expected vehicle speed of $v = 36 km / h$ and high road adhesion $μ = 0.85$ conditions, a trajectory tracking control strategy based on the dynamics model is used to repeat double lane trajectory tracking experiments, in order to achieve the goal of training the DQN network. In theory, as the number of experiments increases, the parameters of the DQN network gradually tend to be optimal and converge. Hyperparameters are shown in Table 1 and convergence curves are shown in Figure 14. The tracking results of using the weight matrix of empirical method and adaptive adjustment weights with Episode of 10 and 1000 are shown in Figure 15.

Figure 14.

Convergence curves.

Figure 15.

Comparison of the MPC trajectory tracking results of dynamics prediction models with fixed weight and adaptive weight adjustment: (a) trajectory, (b) lateral position deviation, (c) heading angle deviation, (d) yaw rate and (e) vehicle speed.

As shown in the Figure 14, the Average Loss decreases rapidly in the initial phase and stabilizes at a minimal value after ∼400 episodes, indicating that the prediction error is minimized. Concurrently, the average total reward rises steadily and converges to a stable value of around 800 after 750 episodes. These synchronous trends confirm that the DQN agent has effectively learned the optimal weight adjustment strategy without divergence.

From Figure 15, when Episode = 10, the tracking effect is even worse than the empirical method. However, with the increase of training rounds, the tracking effect is significantly improved. From Figure 15(b) to (e), it can be seen that when the number of Episode is 1000, the maximum lateral deviation and heading angle deviation further decrease by 36.6% and 19.7%.

Experimental results

A dual motor driven tracked unmanned vehicle was built on the actual vehicle as shown in Figure 16, with specific parameters shown in Table 2, to further verify the control effect and real-time performance of the proposed trajectory tracking control strategy based on dynamics prediction model on the actual vehicle (Table 3).

Figure 16.

A real vehicle platform for the dual motor driven tracked vehicle.

Table 2.

Hyperparameters of the DQN algorithm.

Parameters	Value
Learning rate	0.001
Discount factor	0.99
Max episode	1000
Batch size	64
Replay buffer size	50k
Exploration	Decaying from 1.0 to 0.01

Table 3.

Tracked unmanned vehicle parameters.

Parameters	Numerical value
Track gauge, B/mm	1300
Number of rounds, 2n	10
Driving wheel radius, $r_{z} / mm$	150
Track grounding length, L/m	1.7
Rolling resistance coefficient, f	0.04
Maximum steering drag coefficient, $μ_{max}$	1
Gear ratio, i	6.35
Vehicle mass, m/kg	120
Mass increase coefficient, δ	1.5
Moment of inertia, $J / kg / m^{2}$	300

Due to safety restrictions on the actual site, v = 10.8 km/h, the experimental results are shown in Figure 17, and the on-site diagram is shown in Figure 18.

Figure 17.

Experimental results of trajectory tracking for the tracked unmanned vehicle: (a) trajectory, (b) vehicle speed, (c) lateral deviation, (d) course deviation, (e) yaw rate, (f) motor power, (g) motor speed and (h) driving torque.

Figure 18.

Experimental site of trajectory tracking for the tracked unmanned vehicle.

Due to the limitations of the experimental site and to ensure the safety of the actual vehicle experiment, the trajectory tracking performance of unmanned tracked vehicles under dual lane shifting conditions was only verified under low speed driving conditions. The tracked vehicle was set to an expected speed of 10.8 km/h, and the experimental results are shown in Figure 17. The experimental photos are shown in Figure 18. From the Figure 17(a) and (b), it can be seen that the unmanned tracked vehicle using the proposed NMPC of dynamics prediction models has good trajectory and speed tracking accuracy.

Conclusion

Aiming at the problem of poor tracking accuracy caused by the coupling of longitudinal and lateral dynamics in traditional MPC trajectory tracking control strategies based on kinematics prediction models, a MPC trajectory tracking strategy that integrates kinematics and dynamics models is proposed, achieving a integrated control of the trajectory tracking and handing stability based on yaw rate control. The maximum lateral deviation and heading angle deviation were reduced by 58.4% and 18.6%.

The weight matrix of the objective function in traditional MPC controllers is constant. However, with the continuous changes of measurable disturbances such as reference trajectories and unmeasurable disturbances such as external environments, an unchanged weight matrix of the objective function cannot guarantee the optimal control performance throughout the entire working process. A weight matrix adaptive adjustment method based on deep Q-network (DQN) reinforcement learning strategy is proposed for solving the optimal control torque using MPC, which further reduced the maximum lateral deviation and heading angle deviation by 36.6% and 19.7%.

Footnotes

Handling Editor: Aarthy Esakkiappan

ORCID iD

Baichuan Shi

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by National Key R&D Project under Grant 2022YFB2502702.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Qie

Wang

Yang

, et al. A heavy-duty tracked vehicle model with a reduced feasible domain for motion tracking control considering dynamic characters of hybrid powertrain. Adv Eng Inform 2024; 62(C): 102760.

Zhang

Gai

. Research on system analysis and development issues of the unmanned combat platform. In: 2021 2nd International symposium on computer engineering and intelligent communications (ISCEIC), Nanjing, China, 6–8 August 2021, pp.372–376. New York: IEEE.

Hou

Xiang

Design and comparative study of steering controller for tracked vehicle based on disturbance observation. Proc Inst Mech Eng D J Automob Eng 2023: 4216–4229.

Morales

Martinez

JL.

Pure-pursuit reactive path tracking for nonholonomic mobile robots with 2D laser scanner. J Adv Signal Process 2009; 2009: 935237.

De Luca

Vendittelli

. WMR control via dynamic feedback linearization: design, implementation, and experimental validation. IEEE Trans Control Syst Technol 2002; 10(6): 835–852.

Ding

Xia

, et al., Design of navigation immune PID controller for small tracked rape planter. J Agric Eng 2019; 35(7): 20–28.

Chai

Wang

Research on motion control of crawler robot based on particle swarm optimization fuzzy PID. Mod Electron Technol 2018; 18: 49–53.

Hermansdorfer

Trauth

Betz

, et al. End-to-end neural network for vehicle dynamics modeling. In: 2020 6th IEEE congress on information science and technology, Agadir–Essaouira, Morocco, 5–12 June 2020, pp.407–412. New York: IEEE.

Xiang

Wang

, et al. Robust adaptive trajectory tracking control method for autonomous tracked vehicles. J Mil Eng 2021; 42 (6): 1128–1137.

10.

Zhang

Chuai

. Model predictive control based on multi-objective optimization in multiple scenarios. In: 2023 China automation congress (CAC), Chongqing, China, 17–19 November 2023, pp.7994–7999. New York: IEEE.

11.

Wurts

Stein

Ersal

Collision imminent steering at high speed using nonlinear model predictive control. IEEE Trans Veh Technol 2020; 69(8): 8278–8289.

12.

Sun

Zhang

Zhou

, et al. A model predictive controller with switched tracking error for autonomous vehicle path tracking. IEEE Access 2019; 7: 53103–53114.

13.

Electromechanical coupling dynamics analysis and trajectory tracking control technology research of tracked vehicles. Jilin University, 2018.

14.

Zhou

Research on path planning and track tracking control method of tracked vehicles. Jilin University, 2020.

15.

Burke

. Path-following control of a velocity constrained tracked vehicle incorporating adaptive slip estimation. In: 2012 IEEE international conference on robotics and automation, Saint Paul, MN, USA, 14–18 May 2012, pp.97–102. New York: IEEE.

16.

Chen

Research on trajectory tracking method of unmanned tracked vehicle based on model predictive control. J Ordnance Eng 2019; 40(3): 11–18.

17.

Tang

Liu

Xue

, et al. Track tracking control of dual independent electric drive unmanned tracked vehicle based on MPC–MFAC. J Mil Eng 2023; 44(1): 129–139.

18.

Zhao

Haiou

Huiyan

, et al. Kinematics-aware model predictive control for autonomous high-speed tracked vehicles under the off-road conditions. Mech Syst Signal Process 2019; 123: 333–350.

19.

Liu

Guan

, et al. Trajectory tracking control of unmanned tracked vehicles based on two parameter adaptive optimization. J Ordnance Eng 2022: 1–11.

20.

Zhang

Angeles

Hassani

, et al. Dynamic modeling and trajectory control of dual-motor-driven tracked vehicles. IEEE Trans Veh Technol 2022; 71(7): 6532–6544.

21.

Chen

Dain

, et al. Dynamic model predictive control of tracked vehicles under terrain uncertainty. Mech Syst Signal Process 2023; 181: 109–121.

22.

Elsharkawy

Kamel

Salem

, et al. Dynamic response analysis of tracked vehicles under slope and non-uniform turning conditions. Discover Appl Sci 2024; 6(1): 45–56.

23.

Wei

Zhai

, et al. Optimization of trajectory tracking control for unmanned vehicles considering yaw stability. J Mech Eng 2022; 58(6): 130–142.

24.

Zhang

Sun

, et al. Trajectory tracking control of unmanned vehicle based on model predictive control. Unmanned Syst Technol 2021; 4(6): 46–56.

25.

Chen

Gai

, et al. MPC–TD3 trajectory tracking control for electrically driven unmanned tracked vehicles. Electronics 2024; 13: 3747.

26.

Hedrick

Bhattacharyya

, et al. Reinforcement learning for online adaptation of model predictive controllers: application to a selective catalytic reduction unit. Comput Chem Eng 2022; 160: 107727.

27.

Rajamani

Vehicle dynamics and control, 2nd ed. Springer, 2011.

Model-based MPC with adaptive weights for tracked vehicle trajectory tracking

Abstract

Keywords

Introduction

Kinematics and dynamics modeling of tracked unmanned vehicle driven by dual motors

Steering kinematics analysis

Dynamics analysis during straight-line driving

Dynamic analysis during steering

Preview trajectory tracking MPC control strategy

Model predictive control (MPC)

The MPC based on kinematics of tracked vehicles

The establishment and discretization of prediction models

Objective function and constraints

The optimal solution

The MPC based on dynamics of tracked vehicles

The establishment and discretization of prediction models

Objective function construction and constraints

Adaptive adjustment method of objective function weight matrix based on DQN

Simulation and experiment

MATLAB/Simulink–RecurDyn joint simulation

Simulation results

Experimental results

Conclusion

Footnotes

ORCID iD

Funding

Declaration of conflicting interests

References