Abstract
In electrified powertrains, accurate temperature prediction of electric machines is essential for avoiding thermal derating, improving efficiency, and enabling condition monitoring. However, direct sensing of critical internal temperatures is often limited by cost, packaging constraints, and functional-safety requirements. This motivates the use of virtual-sensor concepts and data-driven digital twins. This work develops a data-driven digital twin for predicting electric machine temperature behavior under highly dynamic vehicle load conditions. Measurements were acquired on a full-vehicle road-to-rig test bench, enabling reproducible excitation of realistic driving profiles while instrumenting the rear traction machine with additional thermal sensing. Signals were filtered and down-sampled from 50 to 1 Hz, and a vehicle-plausible feature set (e.g. vehicle speed, road gradient, and wheel-load-machine parameters) was used to support later in-vehicle deployment. Three model classes were investigated: (i) a cascaded Gaussian process regression with periodic re-initialization, and recurrent neural networks (long short-term memory (ii), gated recurrent units (iii)). The results show a clear trade-off between short-horizon accuracy and long-horizon prediction capability. On the test set, Gaussian process regression achieved the highest short-term accuracy (MAE 0.4 K, MaxAE 1.5 K) with low memory demand (41,904 bytes), whereas long short-term memory and gated recurrent units enabled stable long-horizon prediction at high throughput, but with larger maximum errors (MaxAE 19.5 K and 12.2 K, respectively). Overall, the results quantify the trade-off between forecast horizon, accuracy, and embedded feasibility for digital-twin-based temperature prediction in vehicle operation.
Keywords
Introduction
Motivation and problem statement
Over the past few years, electromobility has become increasingly important. 1 The central element of an electromechanical powertrain is the electric machine.2,3 This shift in vehicle design from the combustion of fossil fuels in internal combustion engines to electric drive concepts presents a wide range of new challenges for automotive engineering. 4 The issue of sustainability is becoming increasingly important in the context of electromobility and growing customer interest in this topic. Therefore, the considerable amounts of raw materials required during the manufacture of an electric vehicle must be used as efficiently as possible. 5 In addition, the thermal behavior of this electric machine is crucial for its efficiency, operational safety, and service life. 1 If the electric machine overheats during operation, this would have far-reaching consequences, ranging from efficiency losses and increased aging of components to component damage and the occurrence of safety-critical conditions. 2
The main objective of this article is to develop a precise temperature prediction model under dynamic load conditions. 5 This model for temperature prediction and monitoring can be used to protect the components of the electromechanical drive train and delay their aging caused by heating. In addition, this work can contribute to increasing the ecological efficiency of an electric vehicle. Precise temperature prediction helps to extend the service life of the machine during operation and reduce development loops and additional prototypes during the development phase. 4 To date, only conventional temperature monitoring methods using physical sensors, such as thermocouples in the coil windings of the stator of an electric machine, have been available. 6
According to ISO/IEC 30173:2023, 7 a digital twin is a “digital representation of a target entity with data connections that enable convergence between the physical and digital states at an appropriate rate of synchronization.” In the context of the present work, this means that a digital twin for electric-machine temperature prediction must not only provide sufficient prediction accuracy but must also remain suitable for vehicle-side implementation in terms of model size, inference speed, and computational effort. 7 This requirement is particularly relevant for data-driven approaches, since highly complex machine-learning models can achieve high predictive performance but may at the same time become too large, too slow, or too resource-intensive for practical deployment on embedded or smart devices.2,3 For this reason, digital twins for electric-vehicle applications must combine application-oriented accuracy with compact model structure and fast inference under highly dynamic operating conditions.2,3
Beyond the present temperature-prediction task, digital twins are becoming increasingly important in various industries.8,9 The use of DTs in product development makes it possible to significantly increase the speed of development, as tasks from later development phases can be addressed at an earlier stage. In addition, DTs enable a significant increase in information content during the various development phases, as errors can be detected and avoided at an early stage with the help of comprehensive simulation models. Furthermore, DTs make it possible to test in a virtual environment and expand available test capacities.2,3
Research objectives and paper structure
The overall objective of this research is to develop and evaluate a data-driven digital twin for predicting the temperature behavior of an electric machine under highly dynamic vehicle load conditions. To this end, various machine learning approaches, in particular regression models and time series prediction methods, are being systematically investigated. To achieve this goal, a comprehensive database is generated on a full-vehicle road-to-rig test bench under defined but application-relevant load conditions. This database forms the basis for the subsequent development and evaluation of different machine-learning approaches. Essentially, two different types of machine learning methods will be investigated.
On the one hand, regression models will be applied to predict temperatures step by step for individual points in time. On the other hand, in addition to the static regression models, the suitability of dynamic time series models for this application will also be examined in more detail.
The main difference between the two types of machine learning is that regression models predict each observation independently of past values, while dynamic time series models explicitly take temporal dependencies into account. 10 Existing work on data-driven temperature prediction often focuses either on prediction accuracy at simplified operating conditions or on single model classes without systematically addressing their suitability for embedded digital-twin applications under realistic vehicle dynamics.2,3 In contrast, the present work therefore aims not only to identify an accurate model, but also to quantify the trade-off between prediction performance and practical deployability in a vehicle-side digital twin. For this reason, the investigated models are evaluated using criteria that are relevant not only from a methodological perspective but also from an application perspective. Prediction accuracy is assessed by mean and maximum absolute and relative errors, since these measures directly reflect the thermal deviation that is relevant for monitoring and protection tasks. In addition, temporal computational complexity is evaluated by prediction speed, test time, and training time. These metrics describe, respectively, the suitability for cyclic inference, the computational effort during model application, and the offline effort required during development and recalibration. Model size is considered as an additional criterion because limited memory resources must be taken into account for future implementation on embedded or smart devices. In this way, the evaluation reflects both predictive quality and application-oriented feasibility of the developed digital-twin approaches.
This paper is divided into five main chapters. Chapter 1 presents the motivation, the problem statement, and the research objectives, and explains the structure of the paper. Chapter 2 provides an overview of the state of the art in the thermal behavior of electrical machines, classical thermal modeling approaches, and data-driven methods of temperature prediction. Chapter 3 describes the experimental setup, data acquisition, repeatability testing, data preprocessing, and the machine learning models used. Chapter 4 presents the results of the modeling, compares the static regression and dynamic time series models, and discusses the limitations of the approaches. Chapter 5 summarizes the most important findings, critically examines them, and provides an outlook on future research.
State of the art
Thermal behavior of electric machines
The thermal behavior of electrical machines is described by various characteristic temperature variables, in particular by housing, stator, and rotor temperatures. While the housing temperature is most commonly measured in industrial applications due to the ease of sensor integration, it is only an indirect indicator of internal thermal loads. 11 Direct measurement of the stator winding is technically possible, but more complex, while the rotor temperature can usually only be measured with great effort due to the rotating structure and is therefore rarely available in series production.11,12
The temperature development of electrical machines results primarily from internal loss mechanisms. Copper losses arise from ohmic losses in the windings and increase quadratically with the current. 13 Iron losses consist of hysteresis and eddy current losses and are highly dependent on frequency and material. 13 In addition, mechanical losses, particularly due to bearing friction and aerodynamic effects, contribute to heating. 11 In vehicle operation, dynamic load changes lead to pronounced transient temperature profiles, which further influence thermal behavior. 14 The operating temperature is highly relevant to the lifespan and performance of electrical machines. Elevated temperatures accelerate the aging of winding insulation and bearings and are one of the most common causes of failure.11,15 To comply with thermal limits, machine performance is often limited (derating) at high temperatures, which has a direct impact on the possible uses of drive trains. 14 Temperature-based condition monitoring therefore serves as a key means of assessing machine condition.11,16
Classical thermal modeling approaches
Classic thermal modeling approaches are used to describe the steady-state and transient temperature behavior of electrical machines and are an integral part of the design and validation phase. The aim of these models is to provide a physically consistent representation of internal heat generation and dissipation under defined operating and boundary conditions. 11
A widely used approach involves analytical and network-based models, in particular Lumped Parameter Thermal Networks (LPTN).17,18 These models approximate thermal behavior using discrete resistance and capacitance elements and enable efficient calculation with comparatively low computational effort.17,18 For constant operating points and known load profiles, such models can achieve good prediction accuracy, while the mapping of highly dynamic effects is only possible to a limited extent.17,18 Numerical models based on the finite element method enable detailed spatial resolution of the temperature distribution and are often coupled with electromagnetic or fluid dynamics simulations.17,18 With correct parameterization, very high accuracies can be achieved, which makes them particularly suitable for development and validation tasks.17,18 However, this is offset by high computing costs and a lack of real-time capability.
The achievable model quality of classical approaches is heavily dependent on the availability of accurate geometry, material, and boundary condition data. In particular, the determination of heat transfer coefficients requires detailed system knowledge and experimental validation.17–19 Several studies show that although high accuracies can be achieved, the last few percentage points require a disproportionately increasing amount of modeling and parameterization effort, which limits their practical applicability in vehicle operation.17–19 These limitations motivate alternative approaches to temperature-based prediction.
Data-driven temperature prediction
There are various approaches to data-driven temperature prediction. From a methodological perspective, recurrent neural networks such as long short-term memory (LSTM) and gated recurrent units (GRU) are established approaches for nonlinear time-series prediction. Their suitability has also been shown in other forecasting domains, such as hydrological, meteorological, and solar irradiance prediction tasks.20–22 These studies indicate that recurrent architectures can capture temporal dependencies in dynamic systems. However, the following review focuses on data-driven temperature prediction for electric machines and vehicle-related thermal applications.
Lee et al. 23 use a feedforward network for prediction at quasi-stationary load points (20–130 °C) and report an absolute error of 4.5 K and an MAE of 0.9 K; however, due to schematically scanned load points, pattern recognition cannot be ruled out, which may limit the significance of the results. 23 Hosseini et al. 24 compare CNN and LSTM models for predicting several PMSM components (tooth, yoke, winding, magnet) with EWMA/EWMS conditioning; the CNN achieves MSE 2.64 °C2 and R2 ≈ 0.992, generalization remains open. 24 Yutthanawa et al. 25 show low MAE for winding temperature (4.651 K) and tooth temperature (1.902 K), but higher deviations for magnet temperature (8.054 K). 25 Jung et al. 26 estimate stator and magnet temperatures based on energy with maximum errors of <5°C and <10°C, respectively. 26 Multi-output networks and real-time approaches extend virtual sensor technology to multiple components.27,28
In, 29 a digital twin for predicting the temperature of electric machines is presented. In it, a virtual machine model with continuously available operating data is developed, which predicts temperatures based on real load and driving cycles. The central approach is a drive cycle-based prediction of the motor temperature using electromagnetic-thermal coupling. 29 In Zhang et al. (2025), 30 a digital twin-supported, predictive thermal model for online monitoring of the stator temperature is also shown for induction machines, including experimental validation. 30 Limitations in both studies lie in parameterization, model robustness under boundary condition changes.29,30
Ramones et al. developed a thermal neural network for real-time temperature estimation of a 48 V permanent magnet synchronous machine based on measurement data from diverse drive cycles under realistic test-bench operation. 31 Zhang et al. presented a digital twin-enabled predictive thermal model for stator temperature monitoring of a three-phase asynchronous motor, combining co-simulation with experimental prototype validation. 30 Aslan compared different machine learning-based regression models for PMSM temperature prediction and showed that such data-driven approaches can achieve high prediction accuracy for early fault detection. 32 Overall, these studies demonstrate that current data-driven approaches already provide promising results for electric machine temperature prediction. However, they are mainly focused on machine-centered or test-bench-based investigations and are not explicitly embedded in a real full-vehicle context.
The existing work presented addresses temperature models that either operate at stationary operating points or remain at the component level. In this work, a data-driven digital twin is developed that, on the one hand, predicts dynamic load changes. On the other hand, in this work, the digital twin has a physical counterpart that is prescribed in a complex overall system context such as a vehicle.
Materials and methods
Experimental setup and data acquisition
The investigations presented in this publication were carried out on a complete car test bench. This is a multiconfigurational test bench on which individual components, complete powertrains, and vehicles can be tested under reproducible boundary conditions. For this work, a vehicle was adapted to the test bench. The test bench consists of four asynchronous machines, each with a rated power of 440 kW, which simulate the loads on the road at the wheel hub position. The test setup used follows the road-to-rig principle. This involves replacing the vehicle's standard wheels with special test rims that incorporate a torsion-resistant mechanical drive-through unit for coupling with the test bench machines. The vehicle is fixed in place without tension using a floating mounting system, so that the chassis kinematics and structural behavior remain as realistic as possible, while at the same time all drive and braking torques can be applied via the wheel machines. The environmental conditions in the test room are defined by active test-cell temperature conditioning and additional ambient monitoring. The test-cell temperature was controlled to 21 °C by the room ventilation conditioning system. Further ambient parameters, including relative humidity, absolute humidity, and air pressure, were continuously monitored by an indoor climate station installed in the test room, but were not actively controlled. In this way, the most relevant thermal boundary condition for the investigated temperature behavior was kept constant, while additional ambient conditions were documented during the measurements.
The test vehicle is a fully electric luxury SUV with all-wheel drive and a system output of 265 kW. The drive architecture is based on an electric platform with axle-separated water-cooled permanent magnet synchronous motors (2× PMSM), which enable variable torque management between the front and rear axles. In normal operation, the rear axle is permanently electrically driven, while the front axle can be switched on as required and decoupled via a mechanical disconnect unit. Modifications were limited to the installation of additional measuring systems for recording thermal and vibration parameters of the rear PMSM. The vehicle was tested as a complete series-production vehicle, including its original vehicle-side thermal management and cooling system. No external cooling circuit was added to the investigated electric machine. Instead, the convective cooling conditions at vehicle level were reproduced by a frontal air-flow blower. The air-flow blower was controlled as a function of the simulated vehicle speed. Thus, the vehicle and its cooling system were operated under reproducible road-to-rig boundary conditions, while the speed-dependent air flow around the vehicle was represented in a test-bench environment.
As shown schematically in Figure 1, the test bench is equipped with extensive measurement technology provided by the manufacturer. Sensors for monitoring key operating parameters are installed on the four-wheel machines, which serve to ensure operational safety on the one hand and enable realistic driving and load scenarios to be simulated on the other. In addition, speed (Heidenhain ROD 486) and torque sensors (HBM T40B) are installed on the wheel load machines.

Test bench setup and sensor concept based on Mercedes-Benz (2024) 33 .
The rear PMSM on the vehicle itself is equipped with external sensors. Six resistance thermometers (PT100) were mounted at defined positions on the stator housing to record the housing temperature distribution under various load conditions. In addition, the vehicle speed and vehicle road gradient as the target variables were measured. All measurement channels were recorded synchronously and recorded at a sampling rate of 50 Hz. A total of 150 stationary load points were run in on the test bench for training and testing the models, as shown in Figure 2. The vehicle speed is plotted on the x-axis and the road gradient on the y-axis. The load points were selected to fully cover the speed/gradient range permitted by German traffic and road construction guidelines and to represent typical operating conditions for an electric vehicle.

Load collective for training and test data.
Each load point consists of a 10-min load phase followed by a 10-min cooling phase at a standstill. The order of the points was randomized to minimize sequence effects and systematic heating influences. Due to the limited range of the vehicle, the 150 load points were divided into ten load collectives, each with 15 points. A total of around 50 h of measurement data was recorded, corresponding to a distance traveled of approximately 1838 km. This resulted in approximately 9,000,000 raw measurement points based on 150 load points, a duration of 20 min per load point, and a sampling rate of 50 Hz.
Repeatability test
As stated, six PT100 sensors were attached to thermally relevant positions on the outer housing. In order to obtain a single representative temperature signal for model training, all six measurement data sets were preprocessed and combined into one representative measurement. For each time step, the mean of all valid housing sensors was calculated. The resulting average housing temperature was then used as the target variable for all machine learning models. This procedure preserves the global thermal trend of the machine while avoiding the need to explicitly model local temperature inhomogeneities.
The quality and structure of the input data form a central basis for the performance of data-driven models. To ensure the reproducibility of the relevant metrics, targeted reproduction tests were therefore carried out. For this purpose, 13 complete WLTC cycles were run, in which the test bench and vehicle were preconditioned identically (identical starting temperatures and boundary conditions).
The evaluation of the 13 WLTC runs shows a high degree of repeatability. Figure 3 shows the temporal temperature behavior of the repeated WLTC cycles, while Table 1 summarizes the statistical repeatability analysis of the relevant input and output variables. For the representative housing temperature, the mean standard deviation across the repeated WLTC cycles was 0.76 K, the 95th percentile standard deviation was 0.88 K, and the maximum standard deviation was 0.89 K. The maximum value deviation range was 2.90 K. These values indicate that the thermal response of the test setup is highly reproducible and sufficiently stable for the subsequent development and evaluation of data-driven temperature prediction models.

Temperature behavior of 13 load cycle (WLTC).
Statistical reproduction analysis of input and output parameters.
Table 1 provides descriptive repeatability statistics for the measured input and output variables during the WLTC reproduction tests. The listed quantities include the global value range, the mean standard deviation, the 95th percentile of the standard deviation (P95 standard deviation), the maximum standard deviation, and the maximum value deviation range across the repeated cycles. Road gradient is not listed separately because the WLTC repeatability runs were performed at a constant road gradient of 0%. Therefore, no cycle-to-cycle variation of this parameter occurred in the repeatability test. The load effect relevant for the drivetrain is instead reflected by the measured wheel torque signals.
Data preprocessing and feature selection
Modeling the motor temperature prediction required multiple stages of preprocessing the measurement data. To reduce measurement noise and ensure a smooth signal curve, a moving mean was first applied. This was combined directly with subsequent down-sampling so that the mean value from 50 raw values was calculated and stored for each second signal. This resulted in filtered down sampling from 50 to 1 Hz. This procedure suppresses non-physical signal fluctuations, reduces high-frequency signal components, and at the same time realistically represents the time scales relevant for the thermal data acquisition process. The same preprocessing procedure was applied to all investigated model approaches. Thus, the Gaussian process regression (GPR), LSTM, and GRU models were trained and tested on the same data basis. This ensures that the model comparison is not influenced by different preprocessing strategies. The downsampling from 50 to 1 Hz was considered suitable for the present temperature prediction task because the thermal behavior of the electric machine is comparatively slow. Short-term peaks in mechanical or electrical load do not cause immediate step changes in the housing temperature, but are thermally integrated by the machine structure. Therefore, the filtered 1 Hz signal preserves the relevant thermal dynamics while reducing measurement noise, data volume, and computational effort.
The filtered signals were then subjected to systematic screening and plausibility checks. In this step, implausible values, outliers, and missing data points were identified and, depending on the cause, corrected or excluded from further evaluation. The input variables considered include vehicle speed, road gradient, and the individual and aggregated drive variables of the four-wheel machines (speed and torque). The target variable of the model is the housing temperature of the rear electric motor measured on the test bench. The selected inputs were intentionally limited to vehicle-plausible operating variables. In this way, the model captures the dominant load-dependent excitation of the thermal behavior while maintaining applicability for later in-vehicle use.
For the division into training and test data, a split of 80% training data and 20% test data of the complete 150 load points was chosen on the basis of previous investigations.34,35 This ratio provides a sufficiently large training set for model development and cross-validation, while retaining an independent data subset for the final performance evaluation. Specifically, eight load collectives with a total of 120 load points were used for training and two load collectives with 30 load points were used for testing the models. The split was performed at the level of complete load collectives and load points, not at the level of individual time samples. This avoids a direct temporal mixing of highly correlated neighboring data points between training and test data. To reduce the risk of overfitting, cross-validation was applied during model development, and the final model performance was additionally evaluated on a separate 20% hold-out test set. 10
To obtain an initial assessment of the relationships between the measured variables, a linear correlation analysis was carried out for all relevant input and target signals. The results are summarized in Figure 4 as a correlation matrix. The correlation coefficients are shown as absolute values on a scale from 0 to 1, where 0 indicates no linear correlation and 1 indicates the strongest linear correlation. This representation provides a compact overview of how strongly the individual variables in the dataset are linearly related to each other and to the target variable, that is, the housing temperature. In the present dataset, road gradient and several torque signals show a higher linear correlation with the target temperature, whereas vehicle speed and the rotational speed signals exhibit only weak linear correlations with the housing temperature.

Correlation matrix of measurement data.
Nevertheless, vehicle speed and rotational speeds were retained in the final feature set. The reason is that the correlation matrix only evaluates pairwise linear dependencies and therefore cannot capture nonlinear relationships, interaction effects, or time-dependent influences. In addition, vehicle speed and rotational speeds remain physically meaningful operating variables. They describe the kinematic state of the drivetrain and may still contain relevant information for data-driven models, even if their isolated linear correlation with the target variable is low. Their inclusion therefore supports both the physical interpretability of the model input and the representation of operating states beyond purely linear effects.
Modeling approaches and machine learning models
Three modeling approaches were chosen for the temperature forecast. On the one hand, regression models with a fixed forecast horizon were used, and on the other hand, recurrent neural networks based on GRU and LSTM were used for modeling complete time series. Data preprocessing, training, testing, and evaluation were implemented entirely in MATLAB. The modeling approaches were also implemented using the Matlab Regression Learner module, the deep network designer module and the neuronal net time series module.
A regression model describes the functional dependence of a continuous target variable y on one or more predictor variables x via a mapping function y ≈ f(x). The model parameters are determined in such a way that the predictions approximate the training data as closely as possible, typically by minimizing an error measure. Once training is complete, the model can be used to make predictions for new operating points. The MATLAB Regression Learner was used to systematically select a suitable regression model. A total of 31 model variants were configured and compared with each other, including linear models, decision trees, ensemble methods, support vector machines, GPRs, and neural networks. The temperature forecast was realized with a horizon of 10 min: Based on a measured starting value, the model determines the temperature curve for the next 10 min; the model is then reinitialized with the newly measured temperature value and the next 10-min segment is forecast. The modeling approach is shown in Figure 5.

Model approach #1 (cascaded GPR).
The hyperparameter tuning was performed automatically within the MATLAB Regression Learner environment, as this represents a standardized and reproducible procedure. In principle, more specialized optimization methods, such as Bayesian optimization, grid search, or random search, could further refine the selected hyperparameters and potentially improve the prediction accuracy. However, given the already high model performance achieved in this study and the increasing implementation effort for the cascaded forecast structure, such extended optimization procedures were considered beyond the scope of the present work. Based on the 31 implemented model variants, the authors empirically identified the most suitable model according to the achieved prediction performance. This selected model was then used as the basis for training the ten cascaded prediction steps applied in the subsequent forecast procedure. Since the implementation effort increased with each additional prediction step due to manual data preprocessing, prediction generation, and training, the model approach 1 was limited to a forecast horizon of 10 min. For longer forecast horizons, other model structures are more suitable. These are presented and discussed in the following sections.
In addition to these segmented regression models, direct modeling of the complete time history was investigated. Due to the highly dynamic thermal properties of electrical machines, time series methods that explicitly derive future values from history are ideal. In this way, heating and cooling phases can be consistently mapped over entire load cycles without having to artificially divide them into discrete time windows.
LSTM (see Figure 6) networks represent an extended class of recurrent networks. They have explicit memory cells and three gate structures (input, forget, and output gates) that can be used to reliably model both short-term and long-term dependencies. This is particularly relevant for temperature prediction in electrical machines, as thermal dynamics are characterized by superimposed load and cooling phases with different time constants. The loss function was here also the MSE; the hyperparameters were tuned in a data-driven manner analogous to the GRU model.

Model approach #2 (LSTM).
The GRU, model approach shown in Figure 7, is also a recurrent neural network that was developed specifically for processing sequential data. Using update and reset gates, the GRU controls which parts of past information are retained or discarded, enabling it to capture long-term dependencies and map nonlinear thermal processes. Compared to LSTM networks, the GRU has a lower architectural complexity with similar prediction quality, allowing training and validation to be performed with a lower computational load. The mean square error (MSE) was used as the loss function; hyperparameters such as neuron count, learning rate, and batch size were optimized empirically.

Model approach #3 (GRU).
The implemented LSTM or GRU architecture consisted of a sequence input layer and one layer with 64 hidden units, using the standard built-in activation functions of the MATLAB LSTM/GRU implementation, that is, sigmoid functions in the gates and a tanh-based state activation, a dropout layer with a dropout rate of 0.05, a fully connected layer, and a regression output layer using the mean squared error as loss function. Thus, a single-layer LSTM/GRU architecture was applied without additional intermediate recurrent layers. The selected architecture and training parameters were determined empirically. Training was performed using the Adam optimizer with a maximum of 80 epochs, an initial learning rate of 1 × 10^−3, a gradient threshold of 1.0, and a mini-batch size of 8. The validation dataset was monitored every 20 iterations, and training was stopped early after 10 validation checks without improvement. The training data were processed without shuffling in order to preserve the temporal structure of the sequences.
Evaluation and discussion
Model comparison and interpretation
Table 2 summarizes the key parameters for the three model approaches examined: the GPR model with a forecast horizon of 10 min and the time series models based on GRU and LSTM networks with a horizon of 200 min as a result of the different algorithm approaches. The metrics listed include the prediction accuracy of MAE, MaxAE, MRE, and MaxRE, temporal computational complexity (prediction speed, test time, and training time), and model size as a measure of usability on embedded control units. These variables are particularly relevant for DTs in vehicles, as both real-time capability and limited memory resources must be taken into account in addition to prediction quality.
Prediction performance of different modeling approaches.
Training time differs only slightly across the three approaches. GPR completes training in 50.56 s, while LSTM and GRU require 57.32 and 56.95 s, respectively. These differences are small and are unlikely to drive model selection when training is performed offline. In contrast, model size separates the methods clearly. GPR is compact at 41,904 bytes. GRU increases to 98,700 bytes, and LSTM reaches 130,300 bytes. This gap is not trivial, because embedded deployment requires persistent storage and runtime availability of the parameters. Inference behavior shows a similar pattern. GPR has the lowest latency with a test time of 0.00299 s. LSTM and GRU are far slower at 0.52 and 0.58 s, respectively. This difference matters when predictions must be computed cyclically and with tight real-time constraints. At the same time, LSTM and GRU provide the highest throughput. LSTM reaches 46,936 obs./s and GRU reaches 42,426 obs./s, while GPR reaches 17,834 obs./s. However, throughput must be interpreted with care. The work states that GPR targets a 10-min horizon, whereas LSTM and GRU cover trajectories up to 200 min.
The error metrics differentiate the model approaches most clearly when the prediction horizons are considered separately. For the 10 min horizon, the GPR model achieves the lowest errors with a mean absolute error of 0.4 K, a maximum absolute error of 1.5 K, a mean relative error of 0.1%, and a maximum relative error of 0.3%. Under the same 10 min condition, the recurrent models show substantially higher errors. The LSTM reaches 9.2 K MAE and 34.6% MRE, while the GRU reaches 6.0 K MAE and 22.2% MRE. For the 200 min horizon, only the recurrent models were evaluated. Here, the average errors decrease markedly to 1.9 K and 5.0% for the LSTM and 2.0 K and 5.2% for the GRU, indicating a substantially improved average fit over the full sequence. However, the maximum errors remain considerably higher, with 19.5 K and 77.9% for the LSTM and 12.2 K and 51.1% for the GRU. This indicates that large local deviations still occur, particularly during demanding transient phases such as cold-start conditions.
Figure 8 compares the temporal progression of the measured temperature and the predicted temperatures of all three model approaches within a common 10 min prediction frame. Within this short-term horizon, model approach 1 follows the measured curve very closely and shows only minor deviations over time. In contrast, model approaches 2 and 3 overestimate the temperature rise from the beginning of the sequence onward. The deviation is largest for model approach 2, while model approach 3 shows a smaller, but still clearly visible, overprediction. From an application perspective, the GPR-based approach therefore provides the highest short-term prediction accuracy among the investigated models. It should be noted that model approach 1 is particularly suitable for short-term prediction intervals. At the same time, Figure 8 indicates a slight increase in deviation toward the end of the prediction window, suggesting progressive error propagation within the cascaded forecast. Although the model achieves high accuracy within this short horizon, this performance cannot be assumed to remain unchanged for extended prediction horizons, since the recursive use of previous predictions would likely cause cumulative deviations, resulting in reduced prediction stability and overall accuracy.

Prediction accuracies of the different model approaches in a 10 min prediction frame.
Figures 9 and 10 show the temperature trajectories for the LSTM and GRU models over an entire load cycle of 200 min. Both models accurately reflect the overall trends of the heating and cooling phases. The largest deviations occur at the beginning of the sequences, when the models have to capture the cold start dynamics from their respective initial states. Within the first 30 min or so, the predicted curves converge significantly with the measured curves; in the subsequent phases, LSTM and GRU show high prediction accuracy over the entire cycle. Unlike the GPR model, no systematic error drift is apparent over the length of the time series; the forecast remains stable over 200 min.

Prediction accuracy of model approach #2 (LSTM).

Prediction accuracy of model approach #3 (GRU).
Overall, it can be seen that GPR has by far the highest accuracy and the lowest resource requirements for short prediction periods, making it particularly suitable as a high-precision, real-time digital twin approach. LSTM and GRU are designed for long time series and achieve very high prediction speeds, but at the cost of larger models and higher errors.
Limitations and discussion
Looking at the results of the prediction models presented, it can be seen that the predictions of the GRU and LSTM models are particularly well suited for longer-term predictions. In contrast, the results of the regression models, in particular the GPR approach, deliver very good results for a short- and medium-term prediction horizon. It should be noted that the trained models are system-specific. The data used in this study were obtained from one vehicle, one rear electric machine, one thermal management system, and one road-to-rig test bench setup. Therefore, the trained models cannot be directly transferred to other electric machines, vehicle architectures, or cooling concepts. Such a transfer would require new measurement data and application-specific calibration or retraining. This is necessary because losses, heat transfer paths, thermal time constants, sensor positions, and control strategies can differ between systems. Nevertheless, the study provides relevant methodological value. It demonstrates a reproducible workflow for developing a data-driven thermal digital twin under full-vehicle road-to-rig conditions. It also quantifies the trade-off between accuracy, forecast horizon, model size, and computational effort for different model classes. A wide range of application-specific requirements plays a decisive role in selecting the optimal prediction model, which is why a close examination of the performance metrics presented in Table 2 is recommended when selecting the optimal prediction model.
The maximum errors mainly occur during transient operating phases. These phases include the beginning and end of load sections as well as strong acceleration or deceleration events. In these situations, torque, rotational speed, and resulting losses change within a short time. The measured housing temperature does not change abruptly because of the thermal inertia of the electric machine. However, these phases are associated with the highest temperature gradients within the investigated cycles. This makes them a challenging learning region for the prediction models, because the models must capture the delayed thermal response after rapid load changes. From an application perspective, these local maximum errors must be considered when the model is used for thermal protection or limit monitoring. For development and model comparison, however, the maximum error should always be interpreted together with the complete temporal prediction behavior.
To achieve greater prediction accuracy, other input parameters can be used, such as electrical and mechanical power, power losses, and cooling power. In addition, it can be seen that, depending on the measurement effort, accessibility, and knowledge of the mechanical design of the motor, the modeling accuracy and overall performance can be significantly increased once again.
In summary, it can be said that the housing temperature is a parameter that is frequently monitored in industry. 12 In highly stressed or highly efficient electrical machines, such as those used in generators, test benches, or electric vehicles, critical temperature ranges are often detected with special physical sensors, such as a stator winding temperature sensor, to ensure precise thermal protection. Based on this work, it can be concluded that the housing temperature of an electric machine is a relevant variable for temperature monitoring. However, in highly loaded or highly efficient electric machines, the exact temperature of the stator winding is often required in order to detect critical temperature conditions at an early stage and prevent them effectively.
However, since installing physical sensors in this area involves high financial and design costs, predicting these critical temperatures based on the housing temperature is a promising alternative to installing physical sensors. For example, the housing temperature can be used as a proxy variable to predict the stator winding temperature. This is supported by the close correlation between these two variables. The housing and stator temperatures exhibit a very similar temporal progression and differ only in their amplitude. 35 This interpretation is supported by internal preliminary investigations, which showed a strong relationship between housing and stator winding temperature, with a Pearson correlation of 0.947, a Spearman correlation of 0.911 (p < 0.001), and a maximum cross-correlation at a lag of 0 s. Due to confidentiality restrictions, the present manuscript focuses on housing temperature as the published target variable. Due to this strong correlation between the housing and stator winding temperatures, the developed method for predicting the housing temperature can provide a suitable basis for future stator winding temperature prediction. 35 However, such a transfer requires application-specific calibration and validation and should not be interpreted as a direct one-to-one substitution.
Conclusion and outlook
This work developed a data-driven digital twin to predict the temperature behavior of an electric machine under highly dynamic load conditions using measurements from a full-vehicle road-to-rig test bench. The signal set was filtered and reduced from 50 to 1 Hz and limited to vehicle-plausible inputs to support later in-vehicle deployment. Three model classes were evaluated with application-relevant criteria covering prediction accuracy, computational effort, and memory demand. For short-horizon forecasting (10 min) with periodic re-initialization, the cascaded GPR model achieved the best overall performance, combining very high accuracy (MAE 0.44 K, MaxAE 1.47 K, very low relative errors) with the smallest model size (41,904 bytes) and the lowest training and test time. For long-sequence prediction over an entire 200 min load cycle, the recurrent models (LSTM and GRU) reproduced the heating and cooling trends and remained stable without systematic drift; their largest deviations occurred at the beginning of the sequences during cold-start dynamics before converging toward the measured trajectory. However, both recurrent approaches required larger models and showed higher maximum errors, with GRU reducing the maximum error compared to LSTM while retaining high prediction speed. Overall, the results quantify a clear trade-off between forecast horizon, accuracy, and embedded feasibility when selecting a digital twin approach. Future improvements can be achieved by incorporating additional physically meaningful inputs such as power-related variables, losses, or cooling capacity, and by leveraging design knowledge where available. Finally, since housing and stator winding temperatures show a strong correlation with similar temporal progression, the presented approach for housing temperature prediction can be transferred to predict stator winding temperature as a promising alternative to costly physical sensing.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Bundesministerium für Wirtschaft und Klimaschutz, Bundesministerium für Bildung und Forschung (Grant Number 13IK024D, 13FH585KX0).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
