Abstract
Electric vehicle motors operating under high power density generate substantial heat, posing significant challenges to temperature regulation accuracy and cooling energy efficiency. Conventional thermal management strategies often struggle to achieve robust performance under highly dynamic and uncertain operating conditions. This paper proposes a hybrid thermal control strategy integrating active disturbance rejection control (ADRC) and Twin Delayed Deep Deterministic Policy Gradient (TD3) within a hierarchical architecture. A multi-heat-source coupled motor thermal model is established to capture complex thermal dynamics of stator, rotor, and cooling subsystems. The lower-layer ADRC ensures fast temperature tracking and disturbance suppression through an extended state observer, while the upper-layer TD3 optimizes cooling energy consumption by learning long-term policies. An adaptive coordination mechanism balances real-time regulation and energy efficiency optimization. Hardware-in-the-loop experiments are conducted on a 150 kW permanent magnet synchronous motor under standard driving cycles and extreme conditions. Results demonstrate that the proposed strategy reduces temperature control RMSE to 1.47 °C and cooling energy consumption by 31.9% compared with conventional PID control, while maintaining strong robustness under ±30% parameter perturbations. These findings indicate that the ADRC–TD3 hybrid strategy provides an effective solution for intelligent thermal management of electric vehicle motors.
Keywords
Introduction
With the intensifying global energy crisis and worsening environmental pollution, electric vehicles, as an important representative of new energy vehicles, are experiencing unprecedented rapid development. The motor system, as the core power unit of electric vehicles, directly affects the vehicle’s power performance and service life through its operating efficiency and reliability. However, motors generate substantial heat during high power density operation, and severe temperature fluctuations not only reduce motor efficiency but may also lead to serious problems such as permanent magnet demagnetization and insulation material aging, making motor thermal management a key technical bottleneck constraining electric vehicle performance improvement. Traditional thermal management strategies often struggle to achieve a good balance between temperature control accuracy and energy consumption optimization, urgently requiring the development of more intelligent and efficient control methods to cope with complex and variable operating conditions.
In recent years, domestic and international scholars have conducted extensive research work in the field of electric vehicle motor thermal management. In terms of traditional control methods, active disturbance rejection control (ADRC) has been widely applied in motor control due to its strong robustness and anti-interference capability.1,2 The ADRC framework proposed by Zhang et al. can effectively suppress torque ripple in permanent magnet synchronous motors, achieving active compensation for internal and external disturbances. 3 Meanwhile, the rapid development of reinforcement learning technology has brought new solutions to thermal management control. Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, as a representative algorithm of deep reinforcement learning, has demonstrated excellent performance in electric vehicle energy management.4,5 Wang et al. utilized model predictive control to achieve energy-efficient optimal operation of thermal management systems, validating the potential of intelligent control methods in reducing system energy consumption. 6 The combination of deep reinforcement learning and fuzzy logic control further enhanced the adaptability and robustness of control strategies.7,8 In the field of permanent magnet synchronous motor control, the TD3 algorithm has been successfully applied to the optimal control of wind turbine generators, demonstrating good dynamic response characteristics. 9 Tang and Zhang combined fuzzy control with the TD3 algorithm to achieve coordinated control of electro-hydraulic composite braking for electric vehicles. 10 These research achievements provide important references for intelligent motor thermal management control, but existing methods still have problems such as slow response speed and large overshoot when handling strongly coupled and nonlinear thermal dynamic characteristics. Despite these advances, current thermal management methods still exhibit notable limitations when applied to electric vehicle motors. Purely model-based controllers often suffer from performance degradation under strong disturbances and parameter variations, while reinforcement learning-based approaches alone typically struggle with slow transient response and safety-critical constraint handling. In particular, achieving fast temperature regulation while maintaining long-term energy efficiency remains a challenging issue for systems with strongly coupled and nonlinear thermal dynamics.
To address these challenges, this paper proposes a hybrid thermal control strategy that integrates active disturbance rejection control and the Twin Delayed Deep Deterministic Policy Gradient algorithm within a hierarchical control framework. In the proposed architecture, ADRC serves as the lower-layer controller to ensure fast temperature tracking and disturbance rejection through an extended state observer, while TD3 operates at the upper layer to optimize cooling energy consumption by learning long-term control policies. The main contributions of this work are threefold. First, a motor thermal dynamic model considering multi-heat-source coupling effects is established to accurately characterize the thermal behavior of key motor components. Second, an adaptive temperature control scheme based on ADRC is designed to enhance robustness against uncertainties and disturbances. Third, a dual-layer ADRC–TD3 collaborative control strategy is developed to achieve coordinated optimization of temperature regulation performance and cooling energy efficiency. The effectiveness of the proposed method is validated on a hardware-in-the-loop experimental platform under both standard driving cycles and extreme operating conditions, providing practical insights for the intelligent design of electric vehicle motor thermal management systems.
The contributions of this study are summarized as follows:
A multi-heat-source coupled thermal modeling framework is developed for electric vehicle motors, capturing the strongly coupled and nonlinear thermal dynamics of key components under high power density and dynamic operating conditions.
A robust lower-layer temperature control strategy based on active disturbance rejection control is designed, in which an extended state observer enables real-time disturbance estimation and compensation, significantly improving transient response and robustness.
A dual-layer collaborative control architecture that integrates ADRC with the Twin Delayed Deep Deterministic Policy Gradient algorithm is proposed, allowing fast temperature regulation and long-term cooling energy optimization to be achieved simultaneously.
The proposed method is experimentally validated on a hardware-in-the-loop platform, demonstrating superior performance in temperature control accuracy, energy efficiency, dynamic response, and robustness compared with conventional control strategies.
Methods
Electric vehicle motor thermal management system modeling
Accurate modeling of the thermal management system of a permanent magnet synchronous motor requires comprehensive consideration of heat source distribution, heat transfer paths, and cooling mechanisms. The heat during motor operation mainly comes from stator copper loss, iron core loss, and rotor eddy current loss, and the spatial distribution and time-varying characteristics of these heat sources determine the dynamic evolution of the motor temperature field of the temperature field. 11
In this study, a hybrid modeling approach combining distributed-parameter and lumped-parameter methods is adopted. The distribution characteristics of the motor’s internal temperature field are obtained through three-dimensional finite element analysis, while the thermal dynamic behavior of key components is described using an equivalent thermal network. The heat conduction equation can be expressed as:
where
As shown in Figure 1, the simplified lumped-parameter model divides the motor into five key thermal nodes: stator winding, stator core, rotor magnet, rotor core, and housing. The temperature dynamics of each node can be described as:
where

Schematic diagram of motor thermal network model structure.
The dynamic characteristics of the cooling system are described through a fluid network model, considering the influence of coolant flow rate, radiator efficiency, and ambient temperature. 12 The coolant temperature change rate is:
where
Active disturbance rejection controller (ADRC) design
The ADRC design is based on the principle of estimating and compensating for system uncertainties and external disturbances in a unified manner through an extended state observer (ESO). 13 An improved active disturbance rejection controller was designed according to the characteristics of the motor temperature control system.
The system model can be expressed in standard form as:
where
The ESO adopts a nonlinear structure to enhance disturbance estimation accuracy:
where
where
where

Active disturbance rejection controller structure block diagram.
Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm architecture
The TD3 algorithm mitigates value function overestimation in deep reinforcement learning through three key mechanisms: dual Q-networks, delayed policy updates, and target policy smoothing. 14 In this study, TD3 is employed as the upper-layer controller to optimize long-term cooling energy consumption in the motor thermal management system.
The state space
where
The action space
where
The reward function design needs to balance multiple objectives 15 :
where
As shown in Figure 3, the TD3 algorithm includes one Actor network
where

TD3 algorithm network architecture.
ADRC–TD3 hybrid control strategy integration framework
The proposed hybrid control strategy integrates ADRC and TD3 within a hierarchical architecture to exploit their respective strengths. 16 In this structure, TD3 operates at the upper layer to perform long-term energy optimization and generate reference trajectories, while ADRC functions at the lower layer to ensure fast temperature tracking and disturbance rejection.
As illustrated in Figure 4, the core of the integration framework lies in the coordination mechanism between the two control layers. The TD3 module generates a temperature reference trajectory and a feedforward control signal:
where
where
where
where

ADRC–TD3 hybrid control architecture.
Experimental platform construction and test environment configuration
The experimental platform was built based on a 150 kW permanent magnet synchronous motor, integrating a complete thermal management system and data acquisition system. 17 Figure 5 illustrates the hardware-in-the-loop (HIL) experimental configuration for electric vehicle motor thermal management.

Hardware-in-the-loop experimental platform for electric vehicle motor thermal management.
As shown in Figure 5, the ADRC–TD3 control algorithm is executed on a dSPACE MicroAutoBox II real-time controller in a closed-loop configuration. The controller generates control commands for the motor driver and cooling actuators, while temperature, flow, and pressure measurements are fed back to the controller in real time. This HIL setup provides the experimental basis for evaluating the fast dynamic response and real-time feasibility reported in Table 1. All experimental results presented in this paper are obtained from the physical HIL bench rather than numerical simulation. The technical specifications of the key equipment are summarized in Table 2.
Dynamic response performance comparison under extreme conditions.
Main equipment parameters of experimental platform.
Test environment configuration includes standard and extreme operating conditions. Standard conditions use NEDC, WLTC, CLTC cycles; extreme conditions include rapid acceleration (0–100 km/h), continuous hill climbing (30% gradient), and high temperature environment (45 °C).18,19 The test condition design referenced the latest domestic and international energy management strategy research results.20,21 The energy management of hybrid electric vehicles involves coordinated control of multiple energy storage systems, and this paper drew on relevant control architecture design experience.22,23 Energy management methods based on real-time model predictive control provided theoretical support for online optimization of the TD3 algorithm. 24
Algorithm implementation process and pseudocode
The ADRC–TD3 hybrid control algorithm is implemented using a modular design to facilitate debugging, parameter tuning, and future extensibility. 25 The overall execution flow of the hybrid control strategy is summarized in Table 3.
ADRC–TD3 hybrid control main program pseudocode.
Table 3 presents the main program structure of the ADRC–TD3 hybrid controller. During operation, sensor measurements are first collected to construct the system state vector.26,27 The TD3 actor network generates supervisory control actions, which are decoded into a temperature reference trajectory and feedforward control signals. 28 The ADRC controller then computes feedback compensation based on the temperature tracking error and the estimated disturbances from the extended state observer (ESO). 29 The final control command is obtained through adaptive weighting between feedforward and feedback components and is applied to the motor thermal management system. The TD3 network training is performed using offline batch updates, with network parameters updated once every 100 control cycles. This training schedule reduces online computational burden while maintaining policy learning stability.
Performance evaluation index and comparison benchmark setting
A comprehensive performance evaluation system is constructed from four dimensions: control accuracy, energy efficiency, dynamic response, and robustness. Temperature control accuracy is evaluated using the following metrics:
Energy efficiency indicators are defined as:
where
Dynamic response indicators include rise time
Table 4 lists the configuration parameters of comparison benchmark algorithms.
Comparison algorithm parameter configuration.
Robustness testing is evaluated by introducing parameter perturbations and external disturbances. The parameter perturbation range is set to ±30% of the nominal value, and external disturbances include step load changes, random noise interference, and sensor faults. Performance degradation rate is defined as:
where
The construction of the performance evaluation system drew on the evaluation method of adaptive equivalent ratio model predictive control. 30 The application of machine learning methods in hybrid electric vehicle energy management provided reference benchmarks for algorithm performance comparison. 31 Control strategy optimization based on energy flow experiments validated the rationality of the experimental scheme in this paper. 32 Reinforcement learning control research on battery and cabin thermal management provided technical reference for system integration testing. 33
For the PID baseline, the gains
Results
Temperature control accuracy analysis
Motor temperature control accuracy directly affects system safety and reliability. This section systematically evaluates the temperature regulation performance of the ADRC–TD3 hybrid control strategy under different operating conditions based on multiple experimental trials. The target temperature was set to 85 °C with an ambient temperature of 25 °C. Temperature responses of key components, including the stator winding and rotor magnet, were measured and analyzed.
As shown in Figure 6, during the 1180 s NEDC driving cycle test, the ADRC–TD3 strategy achieved superior temperature tracking performance. The maximum stator temperature deviation during load transitions was limited to 2.3 °C, representing a 73.6% reduction compared with the 8.7 °C deviation observed under conventional PID control. In addition, the settling time required to reach steady-state temperature was reduced from 48 s (PID) to 23 s, corresponding to a 52.1% improvement in response speed.

Stator temperature tracking curve under NEDC condition.
Table 5 details the temperature control accuracy statistics for different parts.
Temperature control accuracy comparison for key parts (NEDC condition).
As shown in Table 5, the consistent reduction in RMSE, maximum deviation, and overshoot across all monitoring positions indicates that the proposed ADRC–TD3 strategy improves both transient and steady-state temperature regulation. This improvement can be mainly attributed to the complementary roles of the two control layers. The ADRC component effectively compensates for unmodeled thermal dynamics and external disturbances, thereby reducing peak deviations during load changes, while the TD3-based high-level controller adjusts cooling intensity to avoid aggressive control actions that typically lead to overshoot. As a result, the hybrid strategy achieves not only lower tracking error but also smoother temperature evolution, which is particularly important for preventing thermal fatigue in motor components.
As shown in Figure 7, under the more dynamic WLTC condition, the temperature control error exhibits normal distribution characteristics. The error distribution of the ADRC–TD3 strategy is more concentrated, with 95% of sampling points having temperature errors within ±2 °C, while the error distribution range of PID control reaches ±5 °C. The error standard deviation decreased from 3.2 °C for PID to 1.1 °C, indicating a significant improvement in control stability. The more concentrated error distribution under the WLTC condition further suggests that the ADRC–TD3 strategy provides improved control consistency under highly dynamic driving scenarios. The reduction in error variance implies that the controller can maintain stable performance despite frequent operating point transitions, which is difficult to achieve with fixed-parameter PID control. This statistical characteristic highlights the robustness advantage of the proposed hybrid control architecture.

Temperature control error distribution under WLTC condition.
Energy consumption optimization performance evaluation
Cooling system energy consumption is an important factor affecting overall vehicle range. This section evaluates the energy consumption optimization effect of the hybrid control strategy from three dimensions: power consumption, energy efficiency, and energy saving rate.
As shown in Figure 8, under the standard driving cycle, the average cooling power of the ADRC–TD3 strategy was 1.85 kW, representing a 32.0% reduction compared with 2.72 kW under PID control and a 19.9% reduction compared with 2.31 kW under MPC. The peak cooling power decreased from 4.8 kW (PID) to 3.2 kW under the proposed strategy, thereby reducing instantaneous battery load during high-demand phases.

Cooling system power consumption curve under typical conditions.
Table 6 summarizes the energy consumption statistics under different operating conditions. The ADRC–TD3 strategy achieved an energy saving rate exceeding 30% across all tested scenarios, while the cooling energy ratio (CER) remained below 3.5%. The reduction in both average and peak cooling power demonstrates that the proposed strategy avoids unnecessary overcooling, which is a common drawback of rule-based or purely feedback controllers. By jointly considering temperature regulation and energy consumption objectives, the TD3-based supervisory controller enables anticipative cooling actions, while ADRC ensures fast correction when thermal disturbances occur. This coordinated behavior explains why the energy saving rate consistently exceeds 30% across different operating conditions, rather than being limited to a specific driving cycle.
Cooling system energy consumption comparison analysis under different conditions.
As shown in Figure 9, the hybrid control strategy achieves energy consumption optimization through intelligent coordination of the working states of the pump and fan. In the low load phase (0–300 s), the pump flow rate is maintained at a low flow rate of 18 L/min, and the fan power is only 120 W; in the high load phase (600–900 s), the system quickly responds by increasing the flow rate to 48 L/min and the fan power to 680 W; in the steady-state phase, unnecessary excessive cooling is avoided through precise control. The coordinated adjustment of pump flow rate and fan power illustrates how the hybrid controller dynamically allocates cooling resources based on real-time thermal demand. Such coordination reduces mechanical wear of auxiliary components and lowers instantaneous battery load, which contributes to improved system reliability and long-term energy efficiency.

Coordinated optimization curve of pump flow rate and fan power.
Control response characteristics under dynamic conditions
The frequent operating condition transitions in actual electric vehicle operation place strict requirements on the dynamic response capability of the control system.
As shown in Figure 10, during the 0–100 km/h rapid acceleration test, motor power increased from 15 to 150 kW within 12 s. Under this condition, the ADRC–TD3 strategy limited the temperature rise rate to 2.5 °C/s, with a maximum temperature of 91.3 °C, remaining below the 95 °C safety threshold. In contrast, the PID controller exhibited a temperature rise rate of 4.1 °C/s and a maximum temperature of 97.2 °C, indicating potential overheating risk. The response time of the hybrid strategy was 2.8 s, which was ∼45% shorter than that of PID control.

Temperature dynamic response under rapid acceleration condition.
Table 1 provides the dynamic response indicators under extreme conditions. Compared with ADRC-only and TD3-only strategies, the hybrid controller achieves the shortest response time and the lowest temperature rise rate under extreme conditions. This indicates that neither classical control nor reinforcement learning alone is sufficient to handle rapid power transitions effectively. The ADRC layer provides fast disturbance rejection during abrupt changes, while the TD3 layer mitigates delayed or excessive cooling responses, leading to improved safety margins under aggressive driving scenarios. Table 1 summarizes the dynamic response metrics under extreme operating conditions. Compared with ADRC-only and TD3-only strategies, the hybrid controller achieved both the shortest response time and the lowest temperature rise rate. These results suggest that neither classical feedback control nor reinforcement learning alone was sufficient to effectively manage rapid power transitions. The ADRC layer provided fast disturbance rejection during abrupt changes, while the TD3 layer mitigated delayed or excessive cooling actions, thereby improving safety margins under aggressive driving scenarios.
To evaluate the real-time capability of the proposed ADRC–TD3 hybrid control strategy, the computational latency and control update period were measured under hardware-in-the-loop conditions. The low-level ADRC controller operates at a fixed control period of 10 ms, while the high-level TD3 policy is updated at a lower frequency and only provides supervisory setpoints. As a result, the worst-case execution time of the entire control loop remains below 8.4 ms, ensuring that all control actions are completed within a single sampling period.
Compared with MPC and pure TD3 controllers, the proposed hierarchical strategy significantly reduces online computational burden by decoupling fast control from learning-based optimization. Although the overall computational complexity is higher than that of PID and ADRC-only methods, the measured CPU utilization remains below 12% on the target embedded platform, indicating that the proposed algorithm satisfies real-time requirements for vehicle-mounted applications.
As shown in Figure 11, during the continuous urban–suburban–highway transition test, the system experienced six major power step changes. The ADRC–TD3 strategy completed adaptive adjustment within ∼15 s after each transition, maintaining temperature fluctuations within ±2.2 °C. In contrast, traditional control methods exhibited temperature fluctuations exceeding ±5 °C and required more than 40 s to re-stabilize.

Temperature control performance during continuous operating condition transitions.
Algorithm convergence and stability validation
Training efficiency and convergence behavior are critical considerations for the practical deployment of reinforcement learning algorithms in engineering applications. As shown in Figure 12, the cumulative reward of the TD3 algorithm increased progressively during training. After ∼8500 training episodes, the reward value converged from an initial value of −420 to around −128, with post-convergence fluctuations remaining within ±15. In comparison, the standard DDPG algorithm required ∼16,000 episodes to reach convergence. This indicates that the TD3 algorithm achieved ∼46.9% faster convergence under the same training conditions.

TD3 algorithm training process convergence curve.
Table 7 presents the convergence performance under different initialization strategies. The results show that pre-training initialization significantly accelerated convergence, reducing the required training episodes by 52.7% and increasing the convergence success rate to 96%. The improved convergence behavior can be attributed to the structural advantages of TD3, including reduced overestimation bias and delayed policy updates, which enhance training stability. Furthermore, the results indicate that pre-training and transfer learning reduced the exploration burden, which is particularly important for practical deployment scenarios where extensive online training is infeasible.
TD3 algorithm convergence performance statistics.
These results demonstrate that the proposed learning framework achieves stable convergence while maintaining practical training efficiency. As shown in Figure 13, the estimation errors of the extended state observer (ESO) for system states and total disturbances were maintained within 3% and 5%, respectively. Under a 20% parameter perturbation condition, the estimation accuracy remained within an acceptable range, confirming the robustness of the observer design. When the observer bandwidth increased from 10 to 50 rad/s, the estimation delay decreased from 15 to 3 ms; however, noise amplification effects became more pronounced, illustrating the trade-off between response speed and noise sensitivity.

Extended state observer estimation accuracy.
As shown in Figure 13, the estimation errors of the extended state observer (ESO) for system states and total disturbances were maintained within 3% and 5%, respectively. Under a 20% parameter perturbation condition, the estimation accuracy remained within an acceptable range, confirming the robustness of the observer design. When the observer bandwidth increased from 10 to 50 rad/s, the estimation delay decreased from 15 to 3 ms; however, noise amplification effects became more pronounced, illustrating the trade-off between response speed and noise sensitivity.
Comparative analysis with traditional control strategies
A comprehensive comparative analysis was conducted to evaluate the overall advantages of the proposed hybrid control strategy. As shown in Figure 14, six performance dimensions were considered: control accuracy, energy saving performance, response speed, robustness, computational efficiency, and implementation complexity. The ADRC–TD3 strategy achieved high scores in control accuracy (9.2), energy saving performance (9.0), and robustness (8.9), resulting in an overall composite score of 8.53, which was higher than those of the comparison methods.

Comprehensive performance radar chart of different control strategies.
Table 8 presents detailed quantitative comparisons of the performance indicators. The radar chart and composite evaluation results indicate that the ADRC–TD3 strategy achieved balanced performance across multiple dimensions, rather than optimizing a single metric at the expense of others. Although computational resource consumption was higher than that of purely classical controllers, the improvements in control accuracy, robustness, and energy efficiency suggest that the additional computational cost is acceptable for safety-critical electric vehicle applications.
Comprehensive performance comparison analysis of control strategies.
As shown in Figure 15, a cost–benefit analysis over a 5-year operational period was performed. Although the initial implementation cost of the ADRC–TD3 strategy increased by ∼1500 CNY, the energy savings and reduced maintenance costs allowed the additional investment to be recovered by the fourth year. Over 5 years, the total net benefit was estimated to reach ∼2850 CNY.

Long-term operating cost–benefit analysis.
Robustness test results
System robustness is a critical factor in ensuring reliable practical deployment. As shown in Figure 16, during the motor parameter perturbation test, thermal capacity and thermal resistance parameters were varied within ±30% of their nominal values. Under these conditions, the performance degradation rate of the ADRC–TD3 strategy was 15.3%, whereas the degradation rate of the PID controller reached 48.7%. These results indicate that the hybrid strategy maintained stable performance despite significant model uncertainty. The robustness improvement can be attributed to the real-time disturbance estimation capability of the extended state observer (ESO) and the adaptive adjustment mechanism of the TD3 supervisory layer. Together, these mechanisms enhanced the controller’s ability to compensate for parameter deviations and external disturbances.

Performance degradation curve under parameter perturbation.
Table 9 summarizes the robustness test results under various disturbance conditions. As shown in Table 9, the performance degradation rate of the ADRC–TD3 strategy was maintained within 20% across all tested disturbance scenarios, and the recovery time was consistently the shortest among the compared methods.
Robustness test data under different disturbance types.
As illustrated in Figure 17, during the temperature sensor fault simulation, when the primary sensor failed, the system switched to an ESO-based temperature estimation mode within 1.5 s, and the control performance degraded by only 8%. In the actuator fault scenario involving partial pump failure (flow rate reduced by 40%), the controller compensated by increasing fan speed and adjusting valve opening, while maintaining the temperature below the safety threshold. Overall, the robustness and fault-tolerance results indicate that the proposed hybrid control strategy maintained acceptable performance under a wide range of disturbances and fault conditions. The extended state observer enabled real-time disturbance reconstruction and state estimation, while the TD3 supervisory controller adaptively adjusted cooling commands to compensate for actuator and sensor limitations. This cooperative mechanism prevented abrupt performance collapse and supported graceful degradation, which is essential for reliability and safety in real-world electric vehicle operation.

Fault-tolerant control performance under fault modes.
The experimental results demonstrate that the ADRC–TD3 hybrid control strategy not only improves quantitative performance metrics such as temperature tracking accuracy and energy consumption, but also enhances dynamic responsiveness, robustness, and fault tolerance. These analytical observations confirm that the proposed method offers a practical and reliable solution for electric vehicle motor thermal management under complex and uncertain operating conditions.
Discussion
The ADRC–TD3 hybrid control strategy proposed in this study demonstrates clear advantages in electric vehicle motor thermal management by combining the fast disturbance rejection capability of active disturbance rejection control with the long-term optimization ability of deep reinforcement learning. Compared with purely classical or purely learning-based controllers, the hierarchical architecture allows each control layer to focus on complementary objectives, improving overall stability and efficiency. In particular, this design reduces the reliance on an accurate thermal model while retaining a clear control structure, which is beneficial for engineering deployment where parameter drift and operating-condition variability are common.
From an engineering perspective, the superior temperature regulation accuracy and energy-saving performance can be attributed to the decoupling of fast thermal disturbances and long-term energy optimization. The ADRC layer effectively suppresses model uncertainties and external disturbances in real time, while the TD3-based high-level controller adjusts control policies based on global performance objectives. This cooperation enables stable performance under dynamic operating conditions, which is particularly important for real-world electric vehicle applications characterized by frequent load changes and environmental variations. Moreover, the hierarchical coordination helps mitigate the typical stability and safety concerns of end-to-end reinforcement learning controllers by constraining the learning component to a supervisory role, thereby improving controllability and interpretability in safety-critical thermal regulation tasks.
Although the proposed strategy exhibits higher computational complexity than traditional control methods, the hierarchical design and task scheduling mechanism ensure that real-time requirements are still satisfied in vehicle-mounted applications. This balance between control performance and computational cost indicates good practical feasibility. Nevertheless, the method currently focuses on motor-level thermal management, and interactions with other thermal subsystems are not explicitly considered. For example, thermal coupling among the motor, power electronics, and battery cooling loop may alter the optimal cooling allocation under real driving conditions, potentially affecting vehicle-level energy efficiency. In addition, the TD3 policy still requires representative training data to generalize across different ambient temperatures, aging states, and hardware configurations. This limitation motivates further investigation into integrated thermal management strategies at the vehicle level, as well as more sample-efficient and transferable learning schemes for reducing calibration and deployment costs.
Conclusion
This paper proposed a hierarchical hybrid thermal control strategy integrating active disturbance rejection control and the Twin Delayed Deep Deterministic Policy Gradient algorithm for electric vehicle motor thermal management. By combining fast disturbance rejection with long-term energy optimization, the proposed framework enables coordinated improvement in temperature regulation performance and cooling energy efficiency. First, hardware-in-the-loop experiments conducted on a 150 kW permanent magnet synchronous motor under standard driving cycles and extreme operating conditions verify that the proposed strategy achieves improved temperature tracking accuracy, reduced cooling energy consumption, fast dynamic response, and strong robustness compared with conventional control methods. These results demonstrate the effectiveness of integrating classical control theory with deep reinforcement learning for managing complex motor thermal dynamics. Second, despite the demonstrated performance advantages, several limitations remain. The deep reinforcement learning component relies on offline training with representative operating data, which may limit direct transferability to motors or cooling systems with significantly different thermal characteristics. In addition, although the hierarchical design alleviates part of the real-time burden, the overall computational complexity is still higher than that of purely classical control strategies. Finally, future work will focus on reducing computational complexity and training requirements through more efficient learning strategies, as well as extending the proposed framework to integrated vehicle-level thermal management systems involving multiple coupled thermal subsystems, such as motors, batteries, and power electronics.
Footnotes
Handling Editor: Xiang Tian
Author contributions
Yongming Shao: writing – original draft, review, and editing, conceptualization. Weifeng Guo: formal analysis, methodology, validation. Shun Lu: conceptualization, formal analysis. Xinyi Chen: methodology, validation.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Specialized Cluster Construction Project (2024TSZY001), the Development and Research of Advanced Manufacturing Technology Experimental Teaching System (2023syyj054), and the Research on Steering Stability Control of Automobiles Based on EPS (zjt24001).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The data are available from the corresponding author on reasonable request.
