Abstract
Friction torque serves as a critical metric for assessing the dynamic performance of tapered roller bearings (TRBs), influenced by factors such as input speed, temperature, preload force, radial load, and fit tolerance. Fit tolerances are design variables, while preload force, radial load, and speed are operational variables, with temperature acting as a response variable. This study proposes a friction torque prediction method for TRBs using the XGBoost algorithm enhanced by Bayesian optimization (BO). Additionally, other ensemble learning algorithms, including random forest, AdaBoost, CatBoost, LightGBM, and NGBoost, are employed for comparison. Performance is evaluated using root mean square error (RMSE), mean absolute error (MAE), and R-squared (R2). Experimental results indicate that XGBoost outperforms others during training. However, the differences of prediction performance among random forest, CatBoost, LightGBM, NGBoost, and XGBoost are minimal. When combined with SHapley Additive exPlanations (SHAP), speed is identified as the most significant factor affecting friction torque, followed by temperature.
Introduction
Friction torque is a vital indicator of the assemble status of tapered roller bearings. When a certain load is applied on the bearing, friction torque arises from two primary sources: rolling friction between the tapered rolling elements and the raceways due to material elastic hysteresis, and sliding friction between the large end face of the rolling elements and the inner ring flanges.
To obtain the friction torque of tapered roller bearings, Aihara 1 provided empirical formulas for TRB friction torque, explaining the relationship between starting friction torque and preload force. Lostado et al. 2 employed theoretical formulas and finite element analysis to determine TRB rotational torque, and validated the results with experiments. Wingertszahn et al. 3 developed the LaMBDA dynamic model for TRBs, which accurately simulates friction torque compared to experimental measurements. Dindar et al. 4 proposed a methodology for measuring power losses of TRBs under combined radial and axial loads. Takahashi et al. 5 investigated the effects of bearing clearance and atmospheric temperature on performance of pinion bearings of railway vehicles based on the experiments. The results showed that the bearing clearance decreases immediately after the rotation starts, and this tendency becomes more remarkable as the initial bearing clearance is smaller and as the atmospheric temperature is lower.
To reduce the friction torque of tapered roller bearings, Hotta et al. 6 explored the relationship of preload loss with taper, bearing width by observing the preload change when the tapered rolling bearing rotates. Based on the ISO 281 standard, Dragoni 7 maximized the rating life of tapered roller bearings through optimizing the internal dimensions. The results show that the basic rating life increases not only quadratically with the roller infill and the aspect ratio of the rollers, but also with the sixth power of the pitch diameter of the roller set, however decreases with the third power of the applied radial force. Jain et al. 8 examined micro-level factors affecting TRB friction torque, including surface treatment and roughness. Vasiliev et al. 9 listed adjustment parameters of differential bearing preload and defined a method for the adjustment of the preload based on deformation of bearing seats. Wirsching et al. 10 found a trade-off between high load carrying capacity and low frictional losses. Hayashi et al. 11 studied temperature distribution effects on preload force fluctuations, improving TRB fatigue life and reducing friction torque. Martinez et al. 12 optimized TRB working conditions concerning the preload, axial load, radial load, and torque with the contact ratio for each of the two columns of rollers, the gap difference and local deformation using a three-dimensional finite element model and soft computing techniques.
The above study achieved the calculation and measurement of friction torque through the design of structural variables and operating conditions of tapered roller bearings. Given the complexity of measuring TRB friction torque in transmission systems, machine learning algorithms are employed for prediction.13–15 Ensemble learning methods, such as Bagging, 16 Boosting,17–21 and Stacking, are used to enhance model robustness and accuracy. 22 XGBoost, known for its high accuracy, efficiency in handling high-dimensional data and generalization ability, is widely applied in various fields, such as driving assessment and risk prediction, 23 concrete electrical resistance prediction. 24 In addition, parameter optimization of XGBoost based on Bayesian optimization algorithm has been achieved. 25 Although complex machine learning algorithms such as ensemble models perform well in predictive performance, they are often seen as black boxes that are difficult to explain their internal decision-making processes. SHAP as a tool to address this issue has interpretability advantages for non-linear models,26,27 which is absent in empirical models.
Based on the above description, an approach based on Bayesian optimized XGBoost is proposed for the friction torque prediction of tapered roller bearings under various influencing factors such as the fit tolerance of the shaft/hole with bearings, temperature, speed, preload force, and radial load during the bearing assembly process. Combined with SHAP, the relationships between various influencing factors and the friction torque are explained, thereby achieving rapid adjustment of the friction torque of tapered roller bearings to non-standard operating regimes.
Theoretical models for ensemble learning
XGBoost
XGBoost, 21 a Boosting algorithm, constructs a model through iterative tree building. Assuming that the XGBoost model consists of K trees, the predicted values can be expressed as:
where
The loss function of XGBoost consists of two parts: the empirical loss term and the regularization term. The empirical loss term measures the difference between the predicted value and the true value of the model, while the regularization term is used to control the complexity of the model and prevent overfitting. The loss function can be expressed as:
where
XGBoost adopts a forward step-by-step algorithm to gradually construct the model. To quickly optimize the loss function, a second-order Taylor expansion approximation was performed on the empirical loss term:
where
When constructing a decision tree, XGBoost selects the optimal splitting point by calculating the splitting gain. Split gain is defined as the difference between the loss function before and after splitting:
where I
L
and I
R
respectively represent the sample sets contained in the split left and right subtrees,
SHapley Additive exPlanations
The SHapley value was originally used to measure the individual “contributions” in multi person collaboration, calculating the impact of each participant on the final outcome. 27 In machine learning, SHapley values are used to measure the contribution of each feature to the model’s prediction results. Assuming the model has N features, the SHapley value ϕ i of feature i can be expressed as:
where S is a feature subset that does not include feature i;
In machine learning models, the predicted value f(x) can be decomposed into the sum of the contributions of each feature. SHapley value is used to quantify the contribution of each feature to the predicted value. Specifically, the predicted values of the model can be expressed as:
in which, ϕ
i
is the SHapley value of the i-th feature, representing its contribution to the predicted value;
Assembly test design for bearing friction torque
The complexity of the transmission mechanism makes it difficult to measure the friction torque of bearings. Therefore, a test bench is designed to reproduce the influencing factors during the assembly process, including fit tolerance of the bearing inner ring and the shaft tol_inner, fit tolerance of the bearing outer ring and the seat tol_outer, input speed n, preload force F a , radial loading F r , working temperature T.
The principle and bench for testing the friction torque of the combined bearings is shown in Figure 1. 28 The detection system includes two parts: the mechanical system and the measurement and control system. The mechanical system consists of three parts: loading mechanism, driving mechanism, and sensor mechanism. The measurement and control system includes hardware and software systems. The hardware system includes a control module hardware system and a signal acquisition module hardware system. The hardware system of the control part includes motors, drivers, motion control cards, and industrial computers, which complete the motion control during the friction torque measurement process; the hardware system of the signal acquisition module includes force sensors, signal conditioners, acquisition cards, and industrial computers, responsible for torque acquisition, conditioning, result display, and storage. The 10 W-40 heavy-duty power transmission universal lubricating oil is used in the experimental process, with a controlled oil inlet flow rate of 2 ± 0.3 L/min.

Combined bearings friction torque test principle and bench: (a) test principle and (b) test bench.
On both sides are supporting bearings, and in the middle are SKF 32019X tapered roller bearings for combined use. In the test, the friction torque of the combined tapered roller bearings is equal to the torque sensor torque minus the friction torque of the supporting bearings. The testing plan for this experiment is shown in Table 1, where the fit tolerance between the bearing and shaft corresponding to the fit tolerance between the bearing and bearing seat are coupled with four groups. Under four tolerance fit pairs, the friction torque of combined bearings is tested under four different speeds, four different temperatures, four different preload forces, and four different radial loads. Hence, there are a total of 1024 sets of test condition points. Besides, experiments repeated each condition three times, and preheating was conducted until thermal stability (≤±5°C). The measuring results of the friction torque with speed, temperature, and radial load are shown in Figure 2.
Friction torque test design table of the combined tapered roller bearings.

Bearing friction torque with the corresponding influencing factors: (a) speed, (b) temperature, (c) radial load, and (d) friction torque.
Bearing friction torque prediction
Prediction of bearing friction torque
The experimental data, including the fit tolerance between the shaft and the inner ring of the bearing, the fit tolerance between the bearing seat and the outer ring of the bearing, the speed, temperature, axial preload force, and radial loading force, are input into XGBoost for training, as shown in Figure 3. The total sample size is 52,668, of which 90% are randomly selected for training and the remaining 10% are tested. The Bayesian optimization process is set to five iterations and cross-validation is performed two times. The ensemble algorithms such as random forest, AdaBoost, CatBoost, LightGBM, and NGBoost are all set to the same settings for comparison. Besides, the transformer model and the artificial neural network (ANN) with autoencoder are employed as well. In the transformer model, the number of layers is set as 2 and the number of epochs is 200; while the hidden size in the autoencoder is set as 15 and in the ANN is 10 with the number of epochs 1000.

Prediction process for friction torque of tapered roller bearings.
To evaluate the performance of different ensemble algorithms for bearing friction torque prediction, three statistical indicators, namely RMSE, MAE, and R2 are used, and they are expressed as follows:
where y
i
and
The evaluation parameters of each ensemble algorithm are shown in Table 2, and the fitting effects of all machine learning algorithms are shown in Figure 4. It can be seen that the data dispersion of XGBoost fitting is relatively small. The RMSE, MAE, and R2 values of each algorithm during the training and testing stages are shown in Table 3. It can be seen that XGBoost has the best fitting effect during the training stage, and it is used for the following SHAP analysis. Although the testing effect of XGBoost is not as good as random forest, CatBoost, and LightGBM, the results are not significantly different.
Estimated parameters of ensemble algorithms with Bayesian optimization.

Friction torque prediction of the combined tapered roller bearings with different machine learning algorithms: (a) XGBoost, (b) NGBoost, (c) CatBoost, (d) LightGBM, (e) AdaBoost, (f) Random forest, (g) Transformer, and (h) Autoencoder-ANN.
Statistical indicators of different machine learning algorithms under training and testing stages.
Influencing factor analysis
The degree of influence of various factors on bearing friction torque is demonstrated through the feature importance based on mean Shapley value, reflecting the degree of influence of overall sample influencing factors on bearing friction torque. The bee swarm display based on Shapley value contribution reflects the magnitude and direction of the contribution of each sample influencing factor to the bearing friction torque, as shown in Figure 5. SHAP analysis reveals that the speed has the greatest impact on the friction torque of the bearing, followed by temperature. Therefore, the main focus is on studying the variation trend of bearing friction torque with speed and temperature. The design scheme is shown in Table 4. Under the conditions of fitting tolerances of 0.03/0.01 mm, preload force of 4100 N, radial load of 12,000 N, and temperature of 90°C, the trend of bearing friction torque variation with speed is predicted; while with fitting tolerances of 0.03/0.01 mm, preload force of 4100 N, radial load of 12,000 N, and speed of 1500 r/min, the trend of bearing friction torque with temperature is predicted. The trend prediction results are shown in Figure 6. It can be seen that with the increase of the speed, the friction torque also increases; while the temperature increases, the friction torque decreases. To enhance practical utility of the results, a formula for the change of friction torque with speed and temperature based on the predicted data was provided as follows:
in which Sp and Te respectively represent the speed and the temperature with the conditions of fitting tolerances of 0.03/0.01 mm, preload force of 4100 N, radial load of 12,000 N.

Contribution of influence factors for friction torque of the combined tapered roller bearings.
Friction torque variance with speed and temperature.

Friction torque variance of the combined tapered roller bearings: (a) friction torque variance with different speed and (b) friction torque variance with different temperature.
Conclusion
An approach for predicting the friction torque of tapered roller bearings based on Bayesian optimized XGBoost is proposed, which can solve the problem of the friction torque prediction of tapered roller bearings under various influencing factors such as the fit tolerance of the shaft/hole with bearings, temperature, speed, preload force, and radial load during the bearing assembly process. Combined with SHAP, interpretable feature analysis is realized. It is shown that the influence of speed on bearing friction torque is the greatest, followed by temperature, thus achieving rapid adjustment of the friction torque of tapered roller bearings to non-standard operating regimes. Through comparison with random forest, AdaBoost, CatBoost, LightGBM, NGBoost, Transformer model, and Autoencoder-ANN during the training stage, it was found that XGBoost has the best predictive performance with an RMSE of 0.24, MAE of 0.071, and R2 of 0.996; and during the testing stage, the performance of XGBoost is lower than random forest, CatBoost, and LightGBM, but the difference is not significant.
Building on the findings of this study, potential applications are as follows:
(1) Adaptive real-time monitoring systems: integrating the proposed model into industrial sensors for rotating machinery to predict friction fluctuations under transient, non-standard operating conditions (e.g. sudden load spikes or lubricant degradation), enabling proactive maintenance and reducing downtime.
(2) Optimization of bearing design: using the model to simulate friction behavior under non-standard scenarios during the design phase, reducing the need for costly physical prototyping and accelerating the development of specialized bearings.
(3) Enhanced lubrication strategy formulation: leveraging the model’s predictive power to recommend tailored lubricant replacement intervals or viscosity adjustments for equipment operating, improving energy efficiency and extending component lifespan.
Footnotes
Handling Editor: Chenhui Liang
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
