Abstract
Trajectory tracking is crucial for intelligent tracked vehicles, with model predictive control (MPC) being a widely used method due to its ability to handle predictions and constraints. However, conventional MPC based on kinematic models neglects the coupled longitudinal–lateral dynamics, leading to limited accuracy and stability. To address this, we propose an MPC strategy that integrates both kinematic and dynamic models for dual-motor-driven tracked vehicles. This approach uses lateral deviation, heading deviation, longitudinal velocity, and yaw rate as state variables, with motor torques as control inputs, explicitly capturing dynamic coupling and electric drive characteristics. Additionally, we introduce a deep Q-network (DQN)-based adaptive weight adjustment scheme to improve disturbance rejection and overcome the limitations of fixed MPC weights. This adaptive mechanism optimizes the weight matrix online under varying operating conditions. The proposed method is validated through MATLAB/Simulink–RecurDyn co-simulation and low-speed vehicle tests, demonstrating significant improvements in trajectory tracking, with reductions in lateral and heading deviations by 58.4% and 18.6%, respectively. Further, the DQN-based adaptive weighting leads to additional improvements, reducing lateral and heading deviations by 36.6% and 19.7%.
Keywords
Introduction
Tracked vehicles, with their superior off-road capability, high load capacity, low ground pressure, and zero-radius turning, are widely deployed in military, agricultural, engineering rescue, and emergency response applications. 1 Compared with wheeled vehicles, dual-motor independently driven tracked vehicles differ significantly in drivetrain structure and steering mechanism: steering is achieved by generating speed or torque differences between the two tracks. Under real-world operating conditions, they exhibit pronounced longitudinal–lateral dynamic coupling, complex track–ground interactions, and high slip ratios, posing considerable challenges to high-precision trajectory tracking and maneuvering stability control.2,3
The methods of trajectory tracking control mainly include pure tracking algorithm, 4 feedforward-feedback control, 5 PID control,6,7 linear quadratic form (LQR), and model predictive control (MPC). In recent years, machine learning related algorithms have gradually been applied in the field of trajectory tracking control. 8 The pure tracking algorithm is based on the simplified steering kinematics model for trajectory tracking. It has small amount of calculation, can be combined with feedforward-feedback control, and has good trajectory tracking performance for low speed driving on smooth roads surfaces. The PID algorithm is simple, its parameters are easy to adjust, and it has strong robustness. It is only applicable to the decoupled linear time-invariant system, and has certain limitations in the application of nonlinear and high real-time system. The vehicle model in LQR is a linear time invariant system, and the control effect depends on an accurate mathematical model. It cannot guarantee robustness and stability under the condition of time-varying parameters or interference, and the trajectory tracking effect on sections with sudden curvature changes is poor. 9 Model predictive control (MPC) based on vehicle model can effectively solve multi-objective, multi constraint and multivariable optimization problems, with rolling optimization and feedback correction functions, can effectively reduce or even eliminate the impact of closed-loop system time-delay problems.10–12 At present, the MPC is mostly used in intelligent wheeled vehicles to achieve trajectory tracking. Due to the significant differences in the transmission system structure between dual motor driven tracked vehicles and wheeled vehicles. Moreover, the tracked vehicle achieve steering by changing the speed or torque of the tracks on both sides, so the steering principle of tracked vehicles is very different from that of wheeled vehicles. The dynamics system of the tracked vehicle is a nonlinear time-varying system in both lateral and longitudinal directions. Therefore, the MPC is very suitable for the trajectory tracking control of tracked vehicles. At present, compared with wheeled vehicles, there is relatively little research on MPC trajectory tracking control for tracked vehicles.
The core content of MPC includes predictive model, rolling optimization, and feedback correction. The accuracy and complexity of the prediction model will affect the computational efficiency and trajectory tracking accuracy of tracked vehicles. In Li 13 and Zhou, 14 a trajectory tracking MPC controller based on the kinematics prediction model of the tracked vehicle is designed to track the trajectory by controlling the position deviation and heading angle deviation. Among them, the speeds of the left track and right track are the control variables. In Burke, 15 a kinematics prediction model without considering slip is proposed. The designed MPC controller uses track’s speed and the yaw rate as control variables to track vehicle position and heading angle. In Hu et al., 16 considering the effect of slip, the kinematics model of the tracked vehicle based on the instantaneous steering center is used as the prediction model, and the winding speeds of the left and right tracks are used as the control variables to achieve trajectory tracking, and the constraint expressions of the control variables and the state variables are given. Due to the lack of consideration for dynamics constraints, the trajectory tracking deviation in the above studies is significant. The prediction model in Tang et al., 17 Zhao et al., 18 and Lu et al. 19 is the same as that in Hu et al. 16 However, in Tang et al., 17 a dynamics analysis is conducted, the constraint values of the control variable for track winding speed are added, and the constraint of vehicle acceleration considering ground adhesion conditions is also added. The limit values of the speed control increment are defined, resulting in higher control accuracy for trajectory tracking. In Lu et al., 19 steering dynamics analysis was conducted. In order to ensure that tracked vehicle do not roll over, the speed constraints are set, and the longitudinal offset of the steering center is constrained to ensure that steering does not lose control, solving the problem of low trajectory tracking accuracy caused by the steering slip of the dual motor driven high-speed tracked vehicle in off-road condition. On the other hand, considering only the constraints of longitudinal velocity and acceleration cannot guarantee the lateral tracking accuracy of the tracked vehicle under many working conditions.
Recent studies have incorporated higher-fidelity models into MPC. Zhang et al. 20 established a dynamic model of dual-motor-driven tracked vehicles that explicitly accounts for track–ground interaction and motor torque coupling, while Chen et al. 21 proposed a dynamic MPC framework capable of handling coupled dynamics and terrain uncertainty. Similarly, Elsharkawy et al. 22 analyzed the dynamic response of tracked vehicles under slope and nonuniform turning conditions, highlighting the necessity of accurate coupled modeling. These works confirm that dynamic-model-based MPC can significantly improve tracking performance under challenging operating conditions. In this paper, the MPC trajectory tracking control method that integrates the kinematics and dynamics prediction models will be proposed.
Another challenge is that MPC cost function weights are typically set empirically and remain fixed, which hinders adaptability across varying speeds, path curvatures, and adhesion levels. Adaptive methods such as fuzzy tuning or Bayesian optimization have been introduced,23,24 but they either increase online computational burden or rely heavily on offline data with limited robustness. the current online parameter selection method for the MPC controller will to some extent increase the online calculation workload of the MPC, and some offline parameter optimization methods require a large amount of data information and cannot adapt to changing operating conditions. Reinforcement learning has significant advantages in online parameter optimization and has become a development trend at present.25,26
Motivated by these challenges, this paper proposes a trajectory-tracking strategy that integrates a dynamic prediction model with MPC and introduces a deep Q-network (DQN)-based mechanism for online weight adaptation. The contributions are summarized as follows:
We develop a dynamic-model-based MPC framework for dual-motor-driven tracked vehicles, in which lateral deviation, yaw deviation, longitudinal velocity, and yaw rate are selected as state variables and track-side motor torques are used as control inputs. This enables the controller to explicitly account for longitudinal–lateral coupling and electric drive dynamics in both prediction and optimization.
We design a DQN-based adaptive weight adjustment scheme for the MPC cost function. Instead of relying on fixed or heuristically tuned weights, the proposed scheme learns an online mapping from vehicle states and reference information to the optimal weight matrix, thereby improving tracking accuracy and robustness under different speeds, curvatures, and adhesion levels.
The proposed approach is validated through MATLAB/Simulink–RecurDyn co-simulation and low-speed experiments, showing that dynamic-model-based MPC significantly outperforms kinematic MPC in terms of maximum lateral and yaw deviations, while DQN-based adaptation further enhances robustness and tracking performance.
Kinematics and dynamics modeling of tracked unmanned vehicle driven by dual motors
The electronic differential steering is adopted in a dual motor driven tracked vehicle, eliminating the need for mechanical or hydraulic steering control mechanism. By controlling the speed or torque of the motors on both sides, then the steering mode and mechanism are changed. In order to propose an effective model predictive trajectory control strategy, it is necessary to reconstruct the time-varying nonlinear kinematics and dynamics models of electronic differential steering. The schematic diagram of the chassis structure of the dual motor driven tracked vehicle is shown in Figure 1.

Schematic diagram of the chassis structure of a dual motor driven tracked vehicle.
Steering kinematics analysis
As shown in Figure 2, the tracked vehicle may experience lateral sliding during the steering process, especially at high speeds (with speeds >0.8 rad/s), resulting in inconsistent speed direction with the body axis. In Figure 2, XOY is the global coordinate system, φ is the heading angle, β is the center of mass sideslip angle, o is the instantaneous turning center of the vehicle, v is the vehicle centroid velocity, R is the turning radius of the vehicle, and ω is the angular velocity, v L is the axial linear speed at the outer track, n L is the driving wheel speed, v R is the axial linear speed of the inner track, n R is the driving wheel speed; B is the center distance between the two tracks, L is the grounding length between the track and the ground, and r is the radius of the driving wheel.

Kinematics analysis of tracked vehicles.
(O–XY) denotes the global coordinate frame, ψ is the heading angle, β is the sideslip angle at the vehicle center of mass, R is the turning radius, v is the centroid velocity, ω is the yaw rate, B is the track gauge, L is the track-grounding length.
Assuming a negligible change in β, which is valid for low to medium speed trajectory tracking and moderate curvature, where the sideslip angle β varies slowly and remains within a small range. In such cases, β can be treated as a bounded disturbance in the kinematic error dynamics. For higher-speed or more aggressive steering maneuvers, β is estimated online and explicitly incorporated into the dynamic prediction model as a measurable disturbance. A kinematics model of the tracked vehicle can be constructed as follows:
When analyzing the steering motion of the tracked vehicle, it is necessary to consider the sliding phenomenon of the tracks. The slip rate of the track can be expressed as follows:
Where s1 and s2 represent the slip rates of the outer and inner tracks, ω1 and ω2 represent the angular velocities of the outer and inner drive wheels, and v1 and v2 represent the axial linear speeds of the outer and inner wheels, respectively.
So the relationship between v1, v2 and ω1, ω2 is:
If
By substituting the equation (3) into the equation (4), the following expression can be obtained as followed:
Where the sliding parameters can be estimated and obtained from the state feedback
Where, i is the reduction ratio from the motor to the driving wheel.
Dynamics analysis during straight-line driving
The force balance equation for the straight-line traveling of the tracked vehicle is as follows:
Where G is the total vehicle weight (N), M is the vehicle mass (kg), F
t
is the required traction force of the entire vehicle (N),
The F
t
required for the entire vehicle is limited by the output torque
Dynamic analysis during steering
By adjusting the driving torque of the two motors on both sides to regulate the speed of the tracks on both sides, the tracked vehicle can achieve steering motion. The drive motor can operate in four quadrants, and the typical steering conditions for the dual motor independent drive tracked vehicle are shown in Figure 3.

The typical steering conditions for the dual motor independent drive tracked vehicle and the operating characteristics of the four quadrants of the motor.
As shown in Figure 4, the steering speed of the tracks on both sides is different, and the direction of the longitudinal friction force F fi is opposite to the direction of the track speed, namely:
Where, μ is the friction resistance coefficient between the track and the ground. sign is a Sign function:

Schematic diagram of steering force analysis for the tracked vehicle: (a) turning at low speed and (b) turning at high speed.
According to the normal load per unit length on the grounding area of the track, the lateral friction resistance of the corresponding grounding area track can be obtained.
The friction coefficient between the track and the ground is μ, which is approximately equal to the friction coefficient between the track shoe and the ground. The normal load per unit length on the connecting section is p. d is the distance at which the center of mass shifts during high-speed turning. The lateral frictional resistance of the front and rear sections of the track during steering can be expressed as:
The steering resistance moment M y caused by the lateral frictional resistance is:
The resistance moment M f caused by longitudinal friction is expressed as:
Neglecting centrifugal force, the total steering resistance moment of tracked vehicles during steering M zu is:
The constraint expression for the resistance coefficient is as follows:
According to Newton’s Second Law, the lateral, longitudinal, and yaw movements during track steering can be expressed as:
Where,
f is defined as the coefficient of vehicle driving resistance. By substituting the equations (9), (10), and (13) into the equation (15), the following expression can be obtained:
Preview trajectory tracking MPC control strategy
Model predictive control (MPC)
The design of a model predictive controller includes three parts: establishing a predictive model, constructing a objective function and obtaining the optimal control quantities. The design process is shown in Figure 5.

MPC design framework.
In the Figure 5, y(t) is the actual output of the controlled object at the current time, T is the sampling period of the MPC controller, N
p
is the predictive time domain, N
c
is the control time domain,
According to the feedback current state estimation value
The MPC based on kinematics of tracked vehicles
The establishment and discretization of prediction models
According to Figure 2, the preview characteristics of the driver are first considered. The preview distance constant is set to L
d
, and the reference trajectory P
ref
is founded. Its coordinates in its global coordinate system are
From the Figure 2, the state equation of the error model is obtained as follows:
The lateral deviation y e and the heading angle deviation φ e are selected as the state variables, with the yaw rate ω as the control variable. v x is the actual longitudinal speed at the current moment, the curvature of the reference point ρ is selected as the measurable disturbance, y e and φ e are selected as the output variables. The equation (17) is written in the form of state space and expressed as follows:
Where
The continuous state space equation of the equation (17) is discretized and represented as follows:
Where,
Furthermore, in order to improve computational efficiency, the above parameters are simplified:
In summary, the output equation of the entire discrete system at time t is:
Where,
When
The selection of N P has a great impact on the accuracy of trajectory tracking control, and N C is directly related to the dimension of the optimal control sequence, which affects the calculation efficiency. In this paper, according to the equations (21) and (22), N P is selected as 15 and N C is selected as 3, which have been also adjusted by simulation test.
According to different controlled objects, Pa is generally set as 1/4–1/3, tF is the system dynamics response time.
Objective function and constraints
Firstly, in order to improve the trajectory tracking accuracy of the controller, it is desired that the lateral deviation and heading angle deviation of the controlled vehicle tend to zero. The objective function is set as follows:
Where,
The equation (13) is written in matrix form as follow:
Where,
In addition, it is usually necessary to take into account the rate of change of the control variable, and the second term of the objective function is set to:
Where, R is the weight matrix of the MPC controller control term. J2 is generally used to describe the comfort level of riding, but the rate of change of the control variable can also affect the control effect to some extent. For example, an excessively large rate of change of the control variable can easily lead to overshoot during trajectory tracking.
Combining J1 and J2, the objective function of the model predictive controller is designed as follows:
Where,
This objective function combines three tasks. The first task is to ensure the tracking accuracy of the predicted output reference value, that is, the desired path. The second task is to ensure that the change in control increment is as small as possible to ensure the smooth turning behavior of the vehicle. The third task is a relaxation term.
Due to the lack of consideration of dynamics factors and motor characteristics in the kinematics model, only soft constraints are applied to state variables and control variables:
Where, the lateral deviation at each moment shall not be too large, which is taken as
The optimal solution
The objective function is solved by the quadratic programming algorithm, and the equation (26) is transformed into the following standard quadratic function as followed:
From the equation (30), the optimal control increment sequence
The MPC based on dynamics of tracked vehicles
The establishment and discretization of prediction models
The kinematic preview model in equation (17) provides the path-related error states, including the lateral deviation y e and the heading deviation φ e , whereas the dynamic model in equation (16) captures the coupled longitudinal–lateral dynamics through the longitudinal velocity v x and the yaw rate ω. In the proposed controller, these four quantities are stacked into an augmented state vector.
Based on the kinematics analysis of the tracked vehicle steering in Section 2, the prediction equation based on dynamics is constructed as follows:
Where the y
e
, φ
e
, ω, and v
x
are selected as the state variables, the driving torque
Where,
Since the dynamic model (equation (32)) is nonlinear, a successive linearization strategy is adopted. At each sampling instant, the model is linearized around the current state and input,
The resulting local linear time-varying (LTV) model is then discretized via the backward difference method and used as the prediction model in the MPC formulation.
Objective function construction and constraints
Due to the use of a transverse and longitudinal coupling control strategy, the tracking target variables and predicted output variables not only contain lateral deviation and heading angle deviation, but also longitudinal velocity, that is,
The objective function of the trajectory tracking control based on dynamics prediction model is:
Where,
Next, the constraints of the trajectory tracking MPC controller are established and include output constraints, control increment constraints, and state variable constraints. The constraints on state variables and output variables are similar to equations (25) to (27).
The constraints on control variables need to be combined with the mechanical structure of the tracked vehicle based on motor characteristics, and the driving torque
Where,
So,
Taking all the constraints established above into account, the equation (34) is simplified into the standard form of Quadratic programming:
Finally, the solution of the optimal output torque increment sequence is completed. The first term is taken and added to the control variable at t−1 to obtain the optimal control variable at t, which is then output to the controlled vehicle for control.
The optimal control quantity can be determined by minimizing the objective function (equation (37)) while satisfying the constraints. This process can be transformed into a Convex optimization problem under the constraints and solved by the Quadratic programming method.
The final control strategy structure block diagram proposed is shown in Figure 6.

MPC-based trajectory tracking control framework with DQN-based weight adaptation.
Adaptive adjustment method of objective function weight matrix based on DQN
The Q-learning algorithm is a classic algorithm in traditional reinforcement learning. By continuously interacting with the environment, the agents learn better behavioral strategies and gradually update and improve the state action table (Q-table). The value of the table represents the maximum expected reward that can be obtained when executing an action in a certain state. However, when the environment dimension of the agent is high and there are many states, the continued use of Q-table to store the state action value function will cause a “Curse of dimensionality.”
To address this issue, the DeepMind team have integrated the deep neural networks with the Q-learning algorithm and proposed the deep Q network (DQN) algorithm. The DQN algorithm is an improvement on the Q-learning algorithm, which utilizes a deep neural network to dynamically generate a Q-value table and approximates the state action value function by continuously iterating and updating the parameters θ of the neural network f. The goal of parameter learning in the DQN algorithm mentioned above depends on the parameters themselves, which also leads to the Q network becoming more inclined towards actions with originally high Q values, resulting in the instability of the algorithm. To solve this problem, the double deep Q network (DDQN) can add a target Q network on top of the original DQN algorithm. The DDQN algorithm process is shown in Figure 7.

DDQN algorithm training process with experience playback area.
The objective of this study is to calculate the optimal control quantity based on the feedback information from tracked vehicles and the reference trajectory and speed established in the previous planning layer, in order to obtain the optimal tracking accuracy. The solution of the optimal control sequence in MPC strategy is transformed into a quadratic programming problem.
Therefore, the task of solving the optimal control quantity within each time step can be divided into two small subtasks:
Determine the optimal weight matrix
According to a fixed weight matrix, the quadratic programming problem is solved.
Based on the DQN algorithm, the adaptive adjustment of the weight matrix of MPC is achieved. At each calculation time step, the current optimal value is given, and the online update of the Q network is synchronously completed.
The specific steps of the adaptive adjustment method for the weights of MPC based on DQN are as follows:
Step 1: The tracked unmanned vehicle is regarded as an agent, and the external conditions of the tracked vehicle and some body characteristics (such as track material, mechanical structure, etc.) are regarded as environment. The action space, the state space, the state-transition equation and the reward function in the DQN algorithm are defined.
1. State space
The state of the tracked unmanned vehicle at time t is defined, including the current lateral deviation, heading angle deviation, speed error, and the original weight matrix:
2. Action space
The actions within each time step are adjustments to the weight, therefore the action space is defined as:
Where
Among them,
3. State-transition equation
As an action
Subsequently, the optimal motor driving torque is obtained and applied on the tracked unmanned vehicle, causing a transition in the transition space as followed:
where f real represents the feedback state of the actual tracked unmanned vehicle under the given driving torque.
4. Reward function
The ultimate goal of weight adaptive adjustment is to achieve optimal tracking accuracy, so a negative root mean square error is designed as a reward:
Step 2: Set up the Q network and target Q network architecture. The two network architectures are the same, except that the update frequency of internal parameters is different. The hidden layer of Q network is designed as a feedforward neural network consisting of two full connection layers. The number of units is 16 and 8, respectively. The activation function uses the ReLU function. Where, the input layer dimension is determined to be 7 based on s, and the output layer dimension is 1. Assuming that the network parameter of the Q network at time t is Q
t
and the network parameter of the target Q network is
Step 3: Establish a state buffer
The Q value under Q network is defined as
Where, α is the learning rate, and controls the size of each parameter update. Usually, we use a smaller α (such as 10−4 or 10−5), and gradually reduce α during the training process to balance the contradiction between the learning speed and the stability requirements.
The above steps have completed the update of the Q network at time t. After each k time steps, the target Q network is updated once as followed:
The pseudocode of the algorithm for adaptive adjustment of MPC weights based on DQN is shown in Algorithm 1.
Simulation and experiment
MATLAB/Simulink–RecurDyn joint simulation
This dual motor driven tracked vehicle is a typical multivariable and nonlinear system, and a numerical analytical mathematical model is difficult to describe the actual dynamics behavior of the vehicle. RecurDyn multi-body simulation software is used to establish a 3D virtual model of the tracked vehicle and a road surface model. The dual motor driven tracked vehicle model in RecurDyn is shown in the Figure 8. The dynamics parameters of the tracked vehicle are shown in Table 1. The trajectory tracking control strategy model shown in the figure is built in MATLAB/Simulink. Through collaborative simulation using RecurDyn and MATLAB/Simulink, the trajectory tracking control considering actual vehicle dynamics and operational behavior is achieved, as shown in Figure 9. The inputs of the RecurDyn module are the torques of the active wheels on both sides, and the outputs include the speeds of the active wheels, the vehicle speed, the trajectory of vehicle center, the yaw rate, the longitudinal speed of the track, etc.

The tracked vehicle model in RecurDyn.
Dynamic model parameters of tracked vehicle dynamics based on RecurDyn.

MATLAB/Simulink–RecurDyn joint simulation platform.
Simulation results
In order to verify the advantages of the proposed trajectory tracking control strategy based on the dynamics prediction model compared with the traditional control strategy based on kinematics in terms of control accuracy and stability, a double lane trajectory tracking test is designed under the conditions of vehicle speeds of
1. ν=18km/h, μ=0.85
The screenshot of the visualized trajectory dynamic tracking simulation in RecurDyn is shown in Figure 11, and the simulation results of double shift lane tracking are shown in Figure 12.

Screenshot of MPC trajectory tracking visualization simulation based on dynamics prediction model,

Screenshot of the MPC trajectory tracking visualization simulation based on dynamics prediction model,

Comparison of simulation results of the MPC control based on kinematics/dynamics prediction model at
From Figure 13(a) to (c), it can be seen that there is a significant lateral deviation and heading angle deviation when the tracked vehicle turns and changes lanes at times of t= 9.5s, t= 13.5s, and t= 20s. The maximum lateral deviation of the MPC control strategy based on kinematics prediction model is 0.89m, and the maximum heading angle deviation is 0.13rad. The maximum lateral deviation of the control strategy based on the dynamics prediction model is 0.21m, and the maximum heading angle deviation is

Comparison of simulation results of the MPC control based on kinematics/dynamics prediction model at
Figure 13(d) shows that the control strategy based on the dynamics prediction model has a smaller yaw rate, and the controlled tracked unmanned vehicle turns more smoothly.
From Figure 13(e), it can be seen that due to the use of lateral and longitudinal decoupling methods in the kinematics control strategy, the deviation between the output speed and the expected vehicle speed is relatively small. The dynamics based MPC control strategy considers lateral and longitudinal coupling. When the lateral deviation and heading angle deviation are large, multi-objective optimization is used to balance and adjust the longitudinal speed in real time. Therefore, the longitudinal speed will be slightly reduced when the vehicle turns.
2.
The screenshot of the visualized trajectory dynamic tracking simulation in RecurDyn is shown in Figure 11, and the simulation results of double shift lane tracking are shown in Figure 12.
From Figure 12(a) to (c), it can be seen that there is a significant lateral deviation and heading angle deviation when the tracked vehicle turns and changes lanes at times
As shown in Figure 12(d), the yaw rate of the trajectory tracking strategy based on the kinematics prediction model exceeds the limit of
As shown in Figure 12(e), the proposed MPC trajectory tracking control strategy based on the dynamics prediction model can ensure tracking accuracy by appropriately adjusting the vehicle speed through multi-objective optimization algorithms.
2. Adaptive weight adjustment based on DQN:
In order to validate the proposed DQN based MPC weight adaptive adjustment method, under the expected vehicle speed of

Convergence curves.

Comparison of the MPC trajectory tracking results of dynamics prediction models with fixed weight and adaptive weight adjustment: (a) trajectory, (b) lateral position deviation, (c) heading angle deviation, (d) yaw rate and (e) vehicle speed.
As shown in the Figure 14, the Average Loss decreases rapidly in the initial phase and stabilizes at a minimal value after ∼400 episodes, indicating that the prediction error is minimized. Concurrently, the average total reward rises steadily and converges to a stable value of around 800 after 750 episodes. These synchronous trends confirm that the DQN agent has effectively learned the optimal weight adjustment strategy without divergence.
From Figure 15, when Episode = 10, the tracking effect is even worse than the empirical method. However, with the increase of training rounds, the tracking effect is significantly improved. From Figure 15(b) to (e), it can be seen that when the number of Episode is 1000, the maximum lateral deviation and heading angle deviation further decrease by 36.6% and 19.7%.
Experimental results
A dual motor driven tracked unmanned vehicle was built on the actual vehicle as shown in Figure 16, with specific parameters shown in Table 2, to further verify the control effect and real-time performance of the proposed trajectory tracking control strategy based on dynamics prediction model on the actual vehicle (Table 3).

A real vehicle platform for the dual motor driven tracked vehicle.
Hyperparameters of the DQN algorithm.
Tracked unmanned vehicle parameters.
Due to safety restrictions on the actual site, v = 10.8 km/h, the experimental results are shown in Figure 17, and the on-site diagram is shown in Figure 18.

Experimental results of trajectory tracking for the tracked unmanned vehicle: (a) trajectory, (b) vehicle speed, (c) lateral deviation, (d) course deviation, (e) yaw rate, (f) motor power, (g) motor speed and (h) driving torque.

Experimental site of trajectory tracking for the tracked unmanned vehicle.
Due to the limitations of the experimental site and to ensure the safety of the actual vehicle experiment, the trajectory tracking performance of unmanned tracked vehicles under dual lane shifting conditions was only verified under low speed driving conditions. The tracked vehicle was set to an expected speed of 10.8 km/h, and the experimental results are shown in Figure 17. The experimental photos are shown in Figure 18. From the Figure 17(a) and (b), it can be seen that the unmanned tracked vehicle using the proposed NMPC of dynamics prediction models has good trajectory and speed tracking accuracy.
Conclusion
Aiming at the problem of poor tracking accuracy caused by the coupling of longitudinal and lateral dynamics in traditional MPC trajectory tracking control strategies based on kinematics prediction models, a MPC trajectory tracking strategy that integrates kinematics and dynamics models is proposed, achieving a integrated control of the trajectory tracking and handing stability based on yaw rate control. The maximum lateral deviation and heading angle deviation were reduced by 58.4% and 18.6%.
The weight matrix of the objective function in traditional MPC controllers is constant. However, with the continuous changes of measurable disturbances such as reference trajectories and unmeasurable disturbances such as external environments, an unchanged weight matrix of the objective function cannot guarantee the optimal control performance throughout the entire working process. A weight matrix adaptive adjustment method based on deep Q-network (DQN) reinforcement learning strategy is proposed for solving the optimal control torque using MPC, which further reduced the maximum lateral deviation and heading angle deviation by 36.6% and 19.7%.
Footnotes
Handling Editor: Aarthy Esakkiappan
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by National Key R&D Project under Grant 2022YFB2502702.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
