Sage Journals: Discover world-class research

Abstract

5MW wind turbine gearbox high-speed bearing temperature rise failure is one of the important factors affecting the stable operation of the wind turbine, accurate prediction and timely diagnosis can effectively improve the efficiency of the wind turbine. In this paper, a combined modelling wind turbine gearbox high-speed bearing temperature rise prediction method based on Bayesian-LightGBM and improved PSO-SVM is proposed with a 5 MW wind turbine as the research object. Firstly, the initial dimensionality reduction of SCADA data is performed by sparse random projection matrix, which reduces the redundant data. Secondly, feature selection is performed on the remaining data using Bayesian-LightGBM to identify 13 key input feature parameters. Then, the hyperparameters of the PSO algorithm are optimised using Bayesian algorithm and further, the optimised PSO algorithm is applied to identify the SVM parameters. Finally, a simulation experiment platform is established based on MATLAB to verify the temperature rise of high-speed bearings in gearboxes of wind turbines by example calculation and comparative analysis. The results show that the model established in this paper is effective, the prediction results are accurate and the performance is stable, and then compared with the algorithms such as PSO-SVM, SVM, BP, etc, the coefficient of determination of the algorithm is greater than 0.994 in both the training set and the test set, and the average decision percentage error is around 1.88% in both.

Keywords

Wind turbine fault diagnosis Bayesian-LightGBM model improved PSO-SVM model

Introduction

The 5 MW wind turbine generator (WTG) is a key piece of equipment more commonly used in wind power technology to capture wind energy and convert it into electricity. It generally consists of blades, drive shafts, gearboxes, generators, etc.^1,2 The WTG gearbox is a key component of the wind turbine transmission chain, and its reliable stability is a guarantee for the safe operation of the WTG.³ It is the key component of the wind turbine transmission chain. Literature⁴ counted the component maintenance cost and downtime ratio of the whole system of wind turbine equipment for an offshore wind farm, as shown in Figures 1 and 2, respectively. As can be seen from the figures, the downtime caused by the gearbox accounts for more than 21% of the overall unscheduled downtime, and the maintenance cost accounts for more than 18% of the total maintenance cost. Therefore, the gearbox is a component that needs to be focused on in the troubleshooting of WTGs.

Figure 1.

Ratio of maintenance costs of wind power equipment by component.

Figure 2.

Ratio of downtime of wind turbine components.

A survey on the operation quality of wind power equipment nationwide by the Wind Energy Professional Committee of the China Renewable Energy Society points out that gearbox failure accounted for about 21%, which is because the gearbox of WTG has a harsh working condition, and is subjected to heavy load and time-varying impact for a long time, which leads to the continuous impact of the gearbox parts, and is relatively more prone to failure.⁴ In the WTG, the high-speed bearing temperature rise failure is one of the very important failures, because the high-speed bearing bears the important task of supporting the high-speed rotating parts of the gearbox, once the high-speed bearing temperature rise failure will lead to unstable operation or even damage to the gearbox.

Most WTGs are generally installed in geographically complex areas, so if the gearbox high-speed bearings of the WTGs once a failure occurs, the repair costs required are expensive, overhaul and maintenance is also very difficult, and is prone to cause a lot of economic losses, coupled with the WTGs need to run for a long time of the special working mechanism, the load is large, so for the failure of WTGs to analyse the very necessary^5–7! Literature⁸ proposed the use of AGV algorithms in automated container terminals to achieve the use of intelligent algorithms to improve the efficiency of equipment operation. Similarly, the fault diagnosis of wind power generation can be automated and intelligently operated. The real-time monitoring and collection of operational data of WTGs failures are continuously improved, and these data directly or indirectly respond to the operational status of WTG components. Forecasting shifts the reactive maintenance of WTGs to proactive prevention, which reduces the maintenance costs of WTGs by a significant amount. The literature⁹ has given us a lot of inspiration through the treatment of the effect of different factors on the variation of EEOI and the application of the key factors in determining the energy efficiency of ships as a whole. Literature¹⁰ considering the rotation angle of the vessel and constructing a more complex feature extraction mechanism, the RYM model significantly improves the accuracy and efficiency of vessel detection in marine videos. Similarly, in this paper, we will consider the unit operating parameters in a holistic manner so as to improve the accuracy of temperature rise fault diagnosis of high-speed bearings in wind turbine gearboxes as much as possible. Supervisory Control and Data Acquisition (SCADA) system records key parameters such as electrical quantity, temperature, pressure and vibration during the operation of the unit with minute resolution, such as unit output power, rotor speed, wind speed, bearing temperature, winding temperature, gearbox oil temperature, etc., which contains a wealth of operational information.^11–14 As a result, there will be difficulties in predicting certain single components, which will have an impact on the accuracy of the prediction.¹⁵ Based on this, this paper adopts a combined model approach based on the operating data of WTG to work on the temperature rise fault diagnosis of the high-speed bearings of the gearbox of a 5 MW WTG.

The fault feature extraction method is based on signal analysis theory, which improves the accuracy of subsequent model training by extracting the effective feature information of the original signal. Therefore, processing the raw signal is one of the key links in fault diagnosis.¹⁶ Scholars at home and abroad have carried out a lot of research on this problem and proposed many methods, which provide directions for the writing of this article. Literature¹⁷ proposed a generalised regression neural network based integration of single interpolation algorithm, principal component analysis and wavelet based probability density function method applied to the condition monitoring of wind turbine blades, which is able to detect the blade faults of WTGs in advance. Literature¹⁸ used Neighbourhood Component Analysis as a feature selection technique to select suitable input features to predict the oil temperature with gearbox oil temperature as the target attribute and achieved high accuracy. Literature¹⁹ proposed a new method based on the sliding window compressible Gramian Angular Field (GAF) transform based on the distance features of the GAF image matrix is used to solve the problem of diagnosing and classifying the faults of rolling bearings and gears in gearboxes. The method converts one-dimensional fault signals into two-dimensional feature matrices, and constructs the discriminant matrix for each fault category by establishing the average value of the a priori sample feature matrix. Literature²⁰ firstly extracted the low-dimensional fault data by using the preprocessing method of principal component analysis; then, the parameters of the support vector machine model were optimised by using the Grid Search Algorithm (GSA), and the extracted low-dimensional fault features were used as inputs for the training of the model of the support vector machine, and then finally the feature Classification; Literature²¹ used a grey correlation algorithm to extract the feature vectors of the monitoring data. The parameters of the support vector regression model were optimised using genetic algorithm and cross-validation method, and the faults were predicted with high accuracy. Literature²² proposed to filter the feature vectors related to the target according to the ReliefF algorithm as the input dataset of the model; to take the fault-related features as the classification features in the classification algorithm, and to take the different states in the turbine operation (normal or specific fault types) as the classification labels, so as to establish the classification dataset, and to construct the multi-classification turbine fault diagnostic model based on XGBoost; and to compare the model with the SVM-based classification model and AdaBoost classification model for comparison, and the experimental results show that the algorithm has higher diagnosis accuracy. Literature²³ proposed to improve the artificial bee colony algorithm to optimise the Elman neural network method to solve the problem of scarce fault data in the wind farm is not enough to establish a model for training, the establishment of the gearbox temperature model, through the comparison with the actual value of the curve, to get the residuals for analysis, and the establishment of the fault diagnosis model for the gearbox. Literature²⁴ determined the ship engine type by performing data cleaning, interpolation and multi-data fusion on the data provided by the AIS

For regression prediction algorithms can be done mainly through decision trees, support vector machines, neural networks, etc. to predict and analyse the output features. Decision trees are divided into classification trees and regression trees, where classification trees represent discrete variables and regression trees correspond to continuous variables. Random forest is a smarter integrated learning algorithm based on decision trees. Literature²⁵ proposed an improved Random Forest algorithm to solve the problem of small differences among the decision trees in Random Forest, which utilises the classification of decision trees with different weights, and the model is used to diagnose the motor bearings, however, it needs to be validated for the features introduced under variable operating conditions. Literature²⁶ proposed a fault diagnosis model based on the deep forest algorithm, which achieves good diagnostic results under different working conditions. However, the training time is significantly increased compared to algorithms such as random forest. The decision tree algorithm can handle both data classification and regression prediction tasks. Literature²⁷ proposed a Bayesian optimised long and short term memory neural network for windward and fault warning method, which optimised the hyperparameters of the neural network through Bayesian optimisation, resulting in an improvement of fault warning by 4 h, but further elaboration is needed on how to achieve the fusion of vibration information and multivariate information. Literature²⁸ proposed a fault diagnosis algorithm based on improved radial basis neural network, which has high accuracy under fluctuating operating conditions. Literature²⁹ proposed a clustering analysis based on small batch superior k-mean algorithm to achieve the analysis of massive users’ electricity consumption behaviour mining. Literature³⁰ introduced the weight vector into the weight calculation on top of the traditional Gaussian mixture model, and the algorithm can identify clusters in different subspaces in the high-dimensional space, which further enhances the performance of the Gaussian mixture model clustering algorithm, but the improvement of the Gaussian mixture clustering model needs to be further explored for the application scenarios in data mining in industrial fields. Literature³¹ proposed a density peak clustering algorithm, which firstly applies the density peak algorithm to cluster the measured data of wind turbines to achieve data grouping; secondly, according to the clustering results, the wind speed is predicted by using the LSTM neural network model; and finally, the accuracy and effectiveness of the proposed algorithm is verified by the experimental results. However, the truncation distance in the peak density algorithm is easily affected by human subjective selection factors, and the complexity of local density calculation is high. Literature³² proposed a line loss rate estimation method based on mean drift clustering and improved BP neural network to classify the sample data, and then the line loss rate is calculated and validated by example using an improved BP neural network model for each class of data.

Based on this, this paper attempts to combine the Bayesian-LightGBM model with the improved PSO-SVM regression prediction model. The Bayesian-LightGBM model takes full advantage of the auto-tuning of the Bayesian algorithm and the LightGBM in terms of training speed. The improved PSO-SVM model exploits the superiority of SVM in handling high-dimensional data and the strong generalisation ability and robustness of the improved PSO-SVM model. The combination of the above algorithms thus overcomes the redundancy of the feature dataset and the problems of high-dimensional data and nonlinear relationships, focuses on the key points of hyper-parameter tuning, feature engineering, and integrated learning, and tries to propose a solution to address the temperature rise fault diagnosis of high-speed bearings in the gearbox of a 5 MW wind turbine. In summary, this paper proposes a prediction method for the combined model of the high-speed bearing temperature rise of the gearbox of the wind turbine by taking the operation data of the 5 MW wind turbine as an experimental object. The algorithmic model improves the correlation between data feature parameters, reduces the redundancy of the original feature dataset, and improves the accuracy of temperature rise prediction for high-speed bearings. The combined modelling approach is designed as a model for predicting the temperature rise of high-speed bearings in gearboxes based on the operational data of 5 MW wind turbines. The issues of how to effectively select relevant data for fault warning data mining and improve the accuracy of fault warning are practical problems faced in the modelling process of high-speed bearing temperature rise prediction for wind turbine gearboxes. Therefore, this paper proposes a combined modelling approach based on Bayesian-LightGBM and improved PSO-SVM for predicting the temperature rise of high-speed bearings in gearboxes of wind turbines, which are: sparse random projection matrix feature dimensionality reduction, Bayesian-based optimised LightGBM feature visual analysis, and prediction of temperature rise based on the Bayesian-PSO-SVM algorithm that correspond to solving the problems that need to be studied in the modelling of wind turbine failure warning.

The innovation is reflected in the following aspects: (1). on the one hand, it lies in the fact that by firstly using sparse matrix projection to reduce the dimensionality of features for feature engineering, and then combining with Bayesian-Lightgbm to select the appropriate feature parameters, it makes full use of the advantages of feature engineering and hyper-parameter optimisation, which improves the model’s performance and generalisation ability. (2). On the other hand, this paper integrates different machine learning algorithms, in this paper, Bayesian-PSO-SVM is used as the final prediction model, which combines different algorithms such as Bayesian Optimisation, Particle Swarm Optimisation, and Support Vector Machines, and this prediction algorithm improves the robustness and prediction accuracy of the model.

The structure of this paper is shown as follows: Section 2 introduces sparse random matrix projection, Bayesian algorithm, LightGBM model, PSO algorithm and SVM algorithm. Section 3 describes the Bayesian algorithm optimisation of LightGBM model and PSO algorithm, PSO optimisation of SVM. Section 4 describes the preprocessing operation on raw data of 5 MW WTG, example simulation analysis, prediction of temperature rise of high-speed bearing of WTG gearbox by feature parameters under Bayesian-PSO-SVM algorithm, analysis and comparison of predicted data. Section 5 draws the conclusion of this paper. Figure 3 below shows the structure sketch of the experimental process of this paper.

Figure 3.

Layout sketch of the structure of this paper.

Fundamentals

Sparse random matrix projection

SCADA data from WTGs usually contains a large amount of time series data, which records the variations of various parameters such as wind speed, rotational speed, power output, and so on. This kind of data is usually high dimensional and sparse, so it is suitable for applying sparse random matrix projection for feature dimensionality reduction, and the distance relationship between the data is preserved as much as possible.

The specific computational steps for the sparse random matrix projection are as follows:

Initialising a sparse matrix: first, a sparse matrix A, $A \in R^{k * d}$ , is randomly generated with most of its elements being 0 and only a few non-zero elements. It is common to use random generation methods such as the Gaussian distribution approach.

Then, the sparse matrix $A$ is then multiplied with the original dataset $x_{i} \in R^{d}$ to achieve a projection from high to low dimensions.

This matrix multiplication is shown by equation (1):

\begin{matrix} y_{i} = A x_{i} \end{matrix}

(1)

where, $y_{i}$ is the low-dimensional data points after dimensionality reduction, $A$ is the sparse matrix, and $x_{i}$ is the original data set points.

After several dimensionality reduction calculations, thus obtaining the subsequent required low-dimensional data points $y_{1}, y_{2} . . . y_{50}$ (the dimension i of the selected dimensionality reduction is 30).

Sparse random matrix projection is unsupervised algorithm, as an unsupervised algorithm, it has the advantage that it does not need the labelled target information, which is suitable for data processing in the case where the labelled information cannot be obtained or the labelled information is incomplete.

LightGBM decision tree

In the wind turbine operation data, there are a large number of input feature parameter sets, which need to be selected to improve the model effect. Light Gradient Boosting Machine (LightGBM) model has a strong feature selection ability, which is able to automatically select the important features and deal with the unimportant features. There is a close correlation between the feature preprocessing of wind turbine operation data and the LightGBM model, and modelling with LightGBM can better deal with some common problems in feature engineering, and improve the effect and prediction ability of the model.

LightGBM is a decision tree based Gradient Boosting Decision Tree (GBDT) model based on an integrated algorithm of decision trees.^32,33 It is a novel integrated decision tree algorithm proposed by Ke et al.^34,35 LightGBM is based on gradient-based unilateral sampling and independent feature merging for optimal processing of data. Gradient-based unilateral sampling is the information gain obtained by calculating the data instances of small gradient, which accelerates the calculation speed, and at the same time, it can reduce the information loss produced by the small gradient data. Independent feature merging is to merge the sparse features, thus reducing the data dimension and further improving the computational speed. Therefore, GBDT can be expressed as in equation (2):

\begin{matrix} f_{m} (x) = \sum_{m = 1}^{M} T (x; θ_{m}) \end{matrix}

(2)

As shown in the formula, $T (x; θ_{m})$ denotes the decision tree, $θ_{m}$ denotes the decision tree parameters, and the number of trees is $M$ .

Based on the forward distribution algorithm, the GBDT model at step m can be expressed as:

f_{m} (x) = f_{m - 1} (x) + T (x; θ_{m})

(3)

It may be useful to let the loss function be $L$ . The operation of the loss function can be minimised by equation (4), which leads to the parameter $θ_{m}$ :

\hat{θ_{m}} = \arg \min \sum_{i = 1}^{M} L (y_{i}; f_{m - 1} (x) + T (x; θ_{m}))

(4)

Multiple iterations are performed to update the regression tree and obtain the final model.

Bayesian optimisation theory

The application of Bayesian optimisation algorithms to the model can help to optimise the choice of hyperparameters for the model, thus improving the performance and generalisation of the model.³⁶ In the prediction of WTG data, Bayesian optimisation algorithms can help find the optimal combination of hyperparameters without trying a large number of different combinations to maximise the prediction accuracy and efficiency of the model. In the prediction of WTG data, accurate model predictions can help improve the efficiency and reliability of WTGs and reduce costs and maintenance fees. In addition, Bayesian optimisation algorithms can help to improve the robustness of the model, especially in the presence of noise or uncertainty in the WTG data. By optimising the model, it can be better adapted to the characteristics of the data and improve the accuracy and stability of the prediction. Therefore, the Bayesian optimisation algorithm has an important help and role in WTG data prediction.

Bayesian Optimised Theory is a probability-based global optimisation algorithm that obtains information through a black-box objective function $f (x)$ and then finds the next evaluation position, thus iterating back and forth until there is a stabilising trend in the distribution of the parameter.³⁷

Bayesian optimisation is named after the Bayesian theory used in the process as in equation (5):

p (f | D_{1 : t}) = \frac{p (D_{1 : t} | f) p (f)}{p (D_{1 : t})}

(5)

where $f$ is denoted as the objective function and $D_{1 : t} = {(x_{1}, y_{1}), (x_{2}, y_{2}), . . ., (x_{t}, y_{t})}$ is the set of observations. $p (f)$ is the prior probability distribution of the objective function $f$ , and $p (D)$ is the marginal likelihood distribution of the objective function.

Support vector machine algorithm

Support vector machine (SVM) regression algorithms can deal with nonlinear relationships,³⁸ however, wind turbine is a large and complex system containing multiple nonlinear links, such as wind speed, wind direction, blade angle, rotational speed and other factors interacting with each other, and there are many unknown and complex relationships. In this complex environment, traditional linear models may not be able to accurately capture the complex dynamic characteristics and nonlinear relationships of the system. The SVM regression algorithm can map the data into a high-dimensional space through the kernel trick to build a more flexible and accurate prediction model in the nonlinear case. Therefore, for the complex wind turbine prediction problem, the SVM regression algorithm has certain applicability, which can better deal with the system complexity and nonlinear relationship, and improve the prediction accuracy and generalisation ability. The SVM model using the kernel function can map the nonlinear problem into the high-dimensional space for processing. Secondly, the SVM regression algorithm is robust to noise and outliers in the data and can provide better prediction results. In the prediction of WTGs, there may be some abnormal data, and the SVM algorithm can effectively deal with these cases. SVM is a machine learning algorithm proposed by Vapnik.³⁹ The goal of the SVM algorithm is to obtain an optimal classification hyperplane such that the sum of the distances from so positively classified points to the hyperplane and from so negatively classified points to the hyperplane is maximised.^40–42

The optimal classification hyperplane searched by SVM is shown in Figure 4 as an example for a binary classification problem.

Figure 4.

Schematic diagram of support vector machine binary classification.

And when the sample data is linearly indivisible, the essence of SVM is to construct a hyperplane that maps to a higher dimensional plane making it possible to find that hyperplane in the higher dimensional plane. The finding of hyperplane is shown below.

The vast majority of the data is linearly indistinguishable, so we need to map in brand new coefficients, at which point the constraints turn out to be:

y_{i} (ω x_{i} + b) \geq 1 - ζ_{i}

(6)

The objective function is:

\frac{1}{2} ∥ ω ∥^{2} + C \sum_{i = 1}^{N} ζ_{i}

(7)

The SVM optimisation problem at this point is:

min_{w, b} (\frac{1}{2} ∥ ω ∥^{2} + C \sum_{i = 1}^{N} ζ_{i})

(8)

s . t . y_{i} (ω x_{i} + b) \geq 1 - ζ_{i}, ζ_{i} \geq 0

(9)

where $ζ$ denotes the slack variable, $C$ denotes the penalty factor, and then the Lagrangian operators $α$ and $μ$ , thus the Lagrangian function is:

\begin{matrix} L (ω, b, ζ, α, μ) = \frac{1}{2} ∥ ω ∥^{2} + C \sum_{i = 1}^{N} ζ_{i} \\ + \sum_{i = 1}^{N} α_{i} [1 - ζ_{i} - y_{i} (ω^{T} x_{i} + b)] - \sum_{i = 1}^{N} μ_{i} ζ_{i} \end{matrix}

(10)

Thus the dual of the optimisation problem can be obtained as:

L = - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} α_{i} α_{j} y_{i} y_{j} x_{i} x_{j} + \sum_{i = 1}^{N} α_{i}

(11)

Among them, $K = x_{i} x_{j}$ order for the kernel function, for the choice of kernel capability usually use radial basis function as in equation (12), in the use of radial basis kernel function, has the advantages of fewer parameters, nonlinear mapping, fast convergence and so on.

K (x_{i}, x_{j}) = \exp (- \frac{∥ x_{i} - x_{j} ∥^{2}}{2 g^{2}})

(12)

PSO optimisation algorithm

The PSO algorithm can help optimise the parameter selection of the SVM algorithm when predicting the temperature rise value of high-speed bearings of wind turbine gearboxes, in order to improve the prediction accuracy and generalisation ability. The PSO algorithm can improve the prediction performance by improving the parameters of the SVM algorithm so that the model is better adapted to the characteristics of the wind power data. In addition, the PSO algorithm can help the SVM algorithm to overcome the problem of local optimal solution, so as to improve the stability and robustness of prediction. Therefore, the PSO algorithm can play a role in optimising and improving the model in wind power data prediction, and improve the prediction accuracy of the temperature rise value of high-speed bearings of wind turbine gearboxes.

The PSO Algorithm is based on the observation of the activity behaviour of animal clusters, using the sharing of information by individuals in the group to make the movement of the whole group in the problem solving space to produce disordered to ordered evolution, to obtain the optimal solution. It may be assumed that in the D-dimensional search space, a group of particle swarms consisting of $m$ particles is randomly generated $X_{i} = (X_{i 1}, X_{i 2}, . . . X_{iD})$ denotes the location of the ith particle, its position and velocity are initialised, and then the fitness function is defined as:

fitness = \frac{S_{right}}{S_{all}}

(13)

where $S_{right}$ denotes the number of correctly categorised samples and $S_{all}$ is the total number of samples.

In the search space, the velocity of the particle is iteratively represented as:

\begin{matrix} V_{id} (n + 1) = W V_{id} (n) + C_{1} r_{1} [P_{id} (n) - X_{id} (n)] \\ + C_{2} r_{2} (n) [P_{gd} (n) - X_{id} (n)] \end{matrix}

(14)

The position of the particle is iteratively represented as:

X_{id} (n + 1) = X_{id} (n) + V_{id} (n + 1)

(15)

In equations (14) and (15), $r_{1}$ and $r_{2}$ are random numbers of 0, and $P_{id} (n)$ and $P_{gd} (n)$ are denoted as the optimal solutions for particle $i$ and all particles in $k$ iterations.

Bayesian-LightGBM and improved PSO-SVM model

As mentioned earlier, Bayesian algorithm has the ability of hyperparameter optimisation, adaptive tuning and robustness, while PSO algorithm has the ability of global fuzzy search. While the SVM algorithm is difficult to process in high dimensional data and is easily affected by dimensional catastrophe, the PSO algorithm can search the hyperparameter space more efficiently and improve the processing ability of the model. Bayesian algorithm can also help PSO algorithm to reduce the difficulty of manually setting parameters and improve the ability of global search. Bayesian algorithm can also be useful for the LightGBM model to speed up the the process of model training and adjusting the parameters. Figure 5 shows the technology roadmap of the fault diagnosis algorithm model in this paper. The Bayesian optimisation LightGBM hyperparameter pseudocode is shown in Algorithm 1.

Figure 5.

Technology roadmap.

Algorithm 1: Bayesian optimised LightGBM hyperparameters
Importation: hyperparameter type; tuning range; algorithmic model LightGBM; Export: Optimal parameter combination [params]; Step 1: Set the type of hyperparameters and search space to be tuned: $ν = {$ ’min_child_samples’: (2,200) // minimum number of samples for leaf nodes, ’max_depth’ : (2,30), // maximum depth of tree model, ’number of leaf nodes’ : (1200), // number of leaf nodes, ’learning rate’ : (0.001,1) // learning rate } Step 2: Initialise to get the dataset ${D_{1 : t}} = {(x_{1}, y_{1}), (x_{2}, y_{2}), . . ., (x_{t}, y_{t})}$ Step 3: According to the acquisition function $EI (x) = {\begin{matrix} [μ (x) - f (x^{+}) θ (Z) + σ (x) φ (Z), σ (x) > 0], if σ (x) > 0 \\ \max (0, μ (x) - f (x^{+})), if σ (x) = 0 \end{matrix}$ , where hyperparameter combinations. $x_{t + 1} = EI (x, D_{1 : t})$ , obtaining the next most promising assessment point $x_{t + 1}$ . Step 4: Calculate the performance of the algorithmic model corresponding to the sampling point $f_{t + 1}$ ; Step 5: Update $x_{t + 1}$ to observation $D_{1 : t + 1} = {D_{1 : t}, (x_{t + 1}, f_{t + 1})}$ . Update the posterior distribution of $f$ ; Step 6: The loop iterates $t$ times and outputs the optimal parameter combination params = $x'$ ; Step 7: Rounding integer-type parameters in parameter combinations : assert type (num_leaves, max_depth, min_child_samples) == int Step 8: Updating optimal parameter combinations as hyperparameter inputs to the LightGBM algorithm : return [params]

Algorithm 1: Bayesian optimised LightGBM hyperparameters

Importation: hyperparameter type; tuning range; algorithmic model LightGBM;
Export: Optimal parameter combination [params];
Step 1: Set the type of hyperparameters and search space to be tuned:

ν = {

’min_child_samples’: (2,200) // minimum number of samples for leaf nodes,
’max_depth’ : (2,30), // maximum depth of tree model,
’number of leaf nodes’ : (1200), // number of leaf nodes,
’learning rate’ : (0.001,1) // learning rate }
Step 2: Initialise to get the dataset

{D_{1 : t}} = {(x_{1}, y_{1}), (x_{2}, y_{2}), . . ., (x_{t}, y_{t})}

Step 3: According to the acquisition function

EI (x) = {\begin{matrix} [μ (x) - f (x^{+}) θ (Z) + σ (x) φ (Z), σ (x) > 0], if σ (x) > 0 \\ \max (0, μ (x) - f (x^{+})), if σ (x) = 0 \end{matrix}

, where hyperparameter combinations.

x_{t + 1} = EI (x, D_{1 : t})

, obtaining the next most promising assessment point

x_{t + 1}

.
Step 4: Calculate the performance of the algorithmic model corresponding to the sampling
point

f_{t + 1}

;
Step 5: Update

x_{t + 1}

to observation

D_{1 : t + 1} = {D_{1 : t}, (x_{t + 1}, f_{t + 1})}

. Update the posterior distribution of

f

;
Step 6: The loop iterates

t

times and outputs the optimal parameter combination params =

x'

;
Step 7: Rounding integer-type parameters in parameter combinations : assert type (num_leaves, max_depth, min_child_samples) == int
Step 8: Updating optimal parameter combinations as hyperparameter inputs to the LightGBM algorithm :
return [params]

Bayesian improvement for LightGBM

Substituting the training set data into the LightGBM model training, a set of hyperparameters will be randomly generated, while the Bayesian optimisation algorithm will continuously train and adjust the hyperparameters to get the optimal hyperparameter combination, and the optimal hyperparameters will be imported into the model to get the Bayesian-LightGBM model.

The specific steps are as follows:

Step 1. Randomly generate a set of hyper-parametric models thereby calculating the loss function, which will be used as the observation point.

Step 2. Obtain different confidence intervals by observation points and Gaussian process.

Step 3. Calculate the probability density estimate from the acquisition function, set the maximum confidence to the new hyperparameters, and return to model training.

Step 4. Importing optimal hyperparameters into LightGBM, thus obtaining the improved Bayesian-LightGBM model.

The flowchart is shown in Figure 6.

Figure 6.

Flowchart of Bayesian-LightGBM model.

Bayesian improvement of PSO hyperparameters

When using traditional PSO algorithms to solve some problems, the initial particles of PSO algorithms are randomly generated, thus leading to uneven distribution of the initial particles in the space, which makes it difficult to discover the optimal location.

(1) Learning factors

The learning factors $c_{1}$ and $c_{2}$ represent the acceleration term when the particle moves to the optimal position. On the one hand, if the values of $c_{1}$ and $c_{2}$ are small, it will affect the convergence speed of the particle, but it will allow the particle to search stably between the current position and the optimal position, but if the values are too small, it will allow the particle to search back and forth in the target area, thus failing to search for the optimal solution efficiently; on the other hand, too large $c_{1}$ and $c_{2}$ will cause the particle to move too far and cross the target, thus losing the optimal solution.

(2) Inertia weights

The inertia weight $ω$ represents the search capability of the particle in that search space, if $ω$ is too small, it will be trapped in the local optimum, and if $ω$ is too large, it will expand the global search capability and thus miss the optimal solution. Therefore, it is necessary to choose the appropriate $ω$ .

(3) Maximum number of iterations

The maximum number of iterations maxgen represents the total number of updates performed by the PSO algorithm, which affects the convergence speed of the PSO algorithm and the final optimisation results. If it is too large, it will make the PSO algorithm run too long and produce an overfitting effect, which is too poor in the test set; on the contrary, it may make the PSO algorithm stop iterating before finding the optimal solution, and fail to find the optimal solution, which will result in a decrease in accuracy.

Therefore the choice was made to optimise the hyperparameters of the PSO (which included learning factors etc.) using a Bayesian algorithm. The Bayesian optimisation PSO hyperparameter pseudo-code is shown in Algorithm 2.

Algorithm 2 Bayesian-optimised PSO hyperparameters
Input: hyperparameter type; tuning range; algorithmic model PSO; Output: Optimal parameter combination [H]; Initialising Bayesian optimisation parameter_space = { ’maxgen’: (0, 100), ’sizepop’: (0, 10), $' k'^{. . .}$ : (0.1, 1.0) } for iteration in range(max_iterations) do fitness = evaluate_fitness(particle) bayesian_optimizer.update(particle[’params’], fit ness) new_params = bayesian_optimizer.suggest() update_particle(particle, new_params) if fitness > fitness_threshold then break end if new_fitness = evaluate_fitness(particle) if new_fitness < particle[’best_fitness’] then particle[’best_params’] = particle[’params’] particle[’best_fitness’] = new_fitness end if bayesian_optimizer.update_model() if any(particle[’best_fitness’] > fitness_threshold for particle in swarm) then break end if end for The optimal combination of parameters is updated as the hyperparameter input to the PSO algorithm : return [ ]

Algorithm 2 Bayesian-optimised PSO hyperparameters

Input: hyperparameter type; tuning range; algorithmic model PSO;
Output: Optimal parameter combination [H];
Initialising Bayesian optimisation
parameter_space = { ’maxgen’: (0, 100),
’sizepop’: (0, 10),

' k'^{. . .}

: (0.1, 1.0)
}
for iteration in range(max_iterations) do
fitness = evaluate_fitness(particle)
bayesian_optimizer.update(particle[’params’], fit
ness)
new_params = bayesian_optimizer.suggest()
update_particle(particle, new_params)
if fitness > fitness_threshold then
break
end if
new_fitness = evaluate_fitness(particle)
if new_fitness < particle[’best_fitness’] then
particle[’best_params’] = particle[’params’]
particle[’best_fitness’] = new_fitness
end if
bayesian_optimizer.update_model()
if any(particle[’best_fitness’] > fitness_threshold
for particle in swarm) then
break
end if
end for
The optimal combination of parameters is updated as the hyperparameter input to the PSO algorithm :
return [ ]

The steps of Bayesian algorithm to optimise the hyperparameters of PSO algorithm are as follows:

Step 1. Randomly initialise the position and velocity of the particles.

Step 2. Calculate the fitness value of the particle based on the current position.

Step 3. Update the individual best position and group best position for each particle.

Step 4. Adjust the parameters such as inertia weights, acceleration coefficients, etc. according to the Bayesian algorithm.

Step 5. Update the velocity and position of the particle according to the new parameter values.

Step 6. Determine whether the number of iterations is satisfied, if yes, then end the algorithm, otherwise, continue to iterate.

Step 7. Return the found optimal solution.

Figure 7 shows the flowchart of Bayesian optimisation of PSO parameters.

Figure 7.

Flowchart of Bayesian improvement of PSO parameters.

PSO improvement of SVM hyperparameters

PSO-SVM is a method to optimise SVM based on PSO algorithm. When using support vector machine algorithm to solve related classification or regression problems, there will be optimisation problems of two model parameters, the kernel function parameter $g$ and the penalty parameter $C$ . The kernel function parameter $g$ characterises the feature width of the kernel function. The kernel parameter $g$ characterises the feature width of the kernel function, and the penalty parameter $C$ characterises the tolerance level of the weighting coefficients of the sample points out of the error range during the solution of the support vector machine. These two parameters have a great impact on the generalisation ability and solving accuracy of the support vector machine itself. Therefore, in order to obtain better solution results, it is often necessary to optimise two model parameters, to optimise the parameters to obtain the optimal parameter combination. Therefore, PSO is used to optimise the $C$ and $g$ parameters. The PSO-SVM model pseudo-code is shown in Algorithm 3.

Algorithm 3 PSO-SVM model
Initialise the PSO parameters (num_particles, max_iterations,… c2) Initialise the SVM parameter space (C, gamma) Initialise particles with random positions and velocities within parameter space Set global_best_position = None Set global_best_fitness = infinity for iteration in range(max_iterations): do for each particle in particles: do Evaluate fitness of the particle using SVM with current position Update particle’s best known position and fitness if necessary Update global best position and fitness if necessary end for for each particle in particles: do Update particle’s velocity based on PSO formula Update particle’s position based on new velocity Ensure particle’s position stays within parameter space end for end for Optimal_params = global_best_position return [ ]

Algorithm 3 PSO-SVM model

Initialise the PSO parameters (num_particles, max_iterations,… c2)
Initialise the SVM parameter space (C, gamma)
Initialise particles with random positions and velocities within parameter space
Set global_best_position = None
Set global_best_fitness = infinity
for iteration in range(max_iterations): do
for each particle in particles: do
Evaluate fitness of the particle using SVM with current position
Update particle’s best known position and fitness if necessary
Update global best position and fitness if necessary
end for
for each particle in particles: do
Update particle’s velocity based on PSO formula
Update particle’s position based on new velocity
Ensure particle’s position stays within parameter space
end for
end for
Optimal_params = global_best_position
return [ ]

PSO improves the SVM parameters in the following steps:

Step 1. Randomly initialise the position and velocity of the particles.

Step 2. Convert particle positions to SVM model parameters.

Step 3. Calculate the fitness value (SVM model performance metric).

Step 4. Update the best position and global best position.

Step 5. Update the velocity and position of the particle according to the PSO algorithm.

Step 6. Convert the new particle positions to SVM model parameters.

Step 7. Obtain the optimal SVM model parameters.

The PSO-SVM algorithm flowchart is shown in Figure 8.

Figure 8.

PSO-SVM flowchart.

Combined modelling based on Bayesian-LightGBM and improved PSO-SVM

A prediction method based on the combination modelling of Bayesian-LightGBM and improved PSO-SVM is proposed in the above discussion in Chapters 2 and 3, where the SCADA data of 5 MW WTGs is used as the original input dataset, and the new dataset is found by feature dimensionality reduction using sparse stochastic projection matrices and Bayesian-LightGBM model, and then the new dataset is predicted by the improved PSO-SVM regression model algorithm for prediction, the flowchart is shown in Figure 9, and the specific implementation process is described below.

Step 1. Perform the SCADA dataset of the input WTGs.

Step 2. Put the dataset into the sparse random projection matrix model for initial dimensionality reduction to get the initial optimised dataset of 30 dimensions.

Step 3. The downscaled dataset obtained from Step 2 is imported into the Bayesian-LightGBM model for feature visualisation and analysis to obtain the final input feature parameters.

Step 4. Optimisation determination of hyperparameters of PSO optimisation algorithm using Bayesian algorithm.

Step 5. Import the PSO hyperparameters determined in Step 4 and the feature dataset obtained in Step 3 into the PSO-SVM regression model to obtain the prediction dataset.

Figure 9.

Flowchart of Bayesian-LightGBM and improved PSO-SVM prediction algorithm.

Example simulation

Original feature dataset

The SCADA operation data parameters of the WTGs include: active power, impeller speed, generator speed, ambient wind speed, ambient temperature, nacelle temperature, gearbox oil temperature, hydraulic system oil temperature, hydraulic system oil pressure, net-side phase A current, net-side phase B current, net-side phase C current, net-side phase A voltage, net-side phase B voltage, net-side phase C voltage, net-side ab line voltage, net-side bc line voltage, grid-side ca line voltage, etc. 78 input characteristic parameters. Table 1 shows the actual operation data of 5 MW wind turbine in a wind farm with a sampling period of 10 min.

Table 1.

Actual operating data of a 5 MW wind turbine.

Times	Active power (KW)	Impeller speed (r/s)	…	Gearbox high-speed bearing temperature rise (°C)
8/13/22 0:00	0	0.06	…	12.2
8/13/22 0:10	0	−0.1	…	12.2
8/13/22 0:20	0	−0.02	…	11.9
8/13/22 0:30	0	−0.1	…	11.7
8/13/22 0:40	0	0.28	…	11.8
8/13/22 0:50	0	−0.1	…	11.8
8/13/22 1:00	0	0.07	…	11.9
8/13/22 1:10	0	−0.02	…	11.6
8/13/22 1:20	0	−0.1	…	11.4
8/13/22 1:30	0	−0.01	…	11.4
8/13/22 1:40	0	0.02	…	11.6
8/13/22 1:50	293.8	12.88	…	15.5
8/13/22 2:00	201.9	11.11	…	19.4
8/13/22 2:10	269.4	12.41	…	21.4
8/13/22 2:20	351.3	13.34	…	22.6
8/13/22 2:30	298.9	12.66	…	23.8
8/13/22 2:40	363.7	13.27	…	24.7
8/13/22 2:50	643.6	16.28	…	26.5
…	…	…	…	…
8/28/22 22:40	0	0.3	…	13.9
8/28/22 22:50	0	−0.09	…	13.8

There is a correlation between the operational characteristic parameters in the SCADA data of WTGs, including Active power, Impeller speed, Generator speed, Ambient wind speed, Ambient temperature, etc., but these data are not necessarily correlated with the high-speed bearing temperature rise data of WTG gearboxes. Therefore, in order to reduce the redundancy of data parameter types, improve the accuracy of predicting the temperature rise of high-speed bearings in WTG gearboxes, reduce the model training time, and improve the efficiency of model training, it is necessary to select a subset of feature parameters that are correlated with the data of high-speed bearing temperature rise in gearboxes.

In this paper, in feature engineering, feature dimensionality reduction is firstly carried out to initially reduce the dimensionality of the original data to retain the correlation between its basic data, and then again, through feature selection, the feature parameters that are not helpful for the prediction results are deleted, which further reduces the number of feature parameters. Therefore, pre-processing the data is necessary, and this paper firstly adopts the sparse random projection matrix for feature dimensionality reduction, which is used to reduce the dimensionality of the original data by using the sparse random matrix. In this paper from the original 78-dimensional feature data reduced to 30-dimensional data. Then the improved Bayesian-LightGBM algorithm is used for feature importance scoring and feature visualisation of the optimised dataset.

Feature dimensionality reduction

Feature engineering is the selection of more practical features from the original data features, so as to improve the predictive regression ability of the model, is an integral part of machine learning, is divided into feature extraction, construction and feature selection. Feature selection is an important factor that affects the accuracy of recognition results.⁴³ Useless information will not only increase the subsequent computational cost, but also have a great impact on classification recognition.⁴⁴

The sparse random projection matrix is that it can be computationally very efficient in the first place. Compared to other dimensionality reduction methods such as Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA),⁴⁵ it does not need to compute the covariance matrix or scatter matrix of the data, thus saving a lot of computational time and memory consumption. Secondly, it maintains maintains the relative distance relationship between the data. This is because the sparse random projection matrix is downscaled by randomly selecting projection vectors rather than by specific linear transformations. Therefore, it maintains the characteristics of the original data better and ensures the correlation between the data. Finally sparse random projection matrix reduces the dimensionality of the data. This helps to reduce the storage space and computational complexity of the data and can improve the efficiency of subsequent data processing and analysis.⁴⁶

The original 78-dimensional data feature set is input into the model of sparse random matrix projection, and its dimensionality is downgraded to 30 dimensions to obtain the new data set. The obtained dimensionality reduced data is shown in Table 2.

Table 2.

Preliminary downscaling data.

1	2	3	4	5	…	30
3.192	−4.001	−2.629	−1.419	2.975	…	−2.682
3.088	−4.034	−2.657	−1.081	2.902	…	−2.324
3.298	−3.965	−2.62	−1.082	3.161	…	−2.243
3.201	−3.938	−2.626	−1.079	3.27	…	−2.232
4.366	−3.96	−2.627	−1.35	4.24	…	−2.501
3.333	−3.965	−2.57	−1.147	3.147	…	−2.208
3.276	−3.972	−2.62	−1.215	3.095	…	−2.275
3.535	−3.918	−2.616	−1.017	3.431	…	−2.062
3.4	−3.863	−2.632	−1.08	3.23	…	−2.048
4.325	−3.897	−2.632	−0.879	3.885	…	−1.836
3.142	−3.926	−2.586	−0.881	2.709	…	−1.825
3.584	0.263	−1.151	0.023	2.844	…	0.259
2.197	0.175	−1.036	−0.365	1.782	…	1.018
3.058	0.505	−0.568	−0.076	2.918	…	0.716
0.734	0.325	−0.629	0.09	1.029	…	0.736
2.571	0.194	−0.437	−0.173	2.667	…	0.475
0.457	0.601	−0.346	0.041	0.96	…	0.539
0.097	1.02	−0.035	0.329	0.158	…	−0.042
…	…	…	…	…	…	…
4.61	−2.554	−2.402	−0.139	4.289	…	0.821

Next, the optimised data are input into the Bayesian-LightGBM model as a new set of data feature parameters for feature visualisation, so as to delete the unimportant feature terms and complete the feature selection operation. The advantages of combining sparse random projection matrix feature downscaling and Bayesian-LightGBM model for feature engineering operations are, firstly, sparse random projection matrix feature downscaling can reduce the dimension of the feature space and improve the accuracy of feature selection; secondly, the improved Bayesian optimisation algorithm applied to the LightGBM model can search for the feature subset more efficiently and improve the efficiency of the model. The feature visualisation operation can be completed, and the score of each feature can be obtained to explain the importance of each model, which reduces the manual intervention.

In the LightGBM model, the learning rate, the maximum depth of the tree, the minimum number of leaf node samples and the number of leaf nodes are selected as hyperparameters to be optimised by the Bayesian algorithm. These hyperparameters are within the search space of Bayesian optimisation and can be used in a continuous numerical way to define and evaluate the performance of the model. Based on the previous evaluation results and the Gaussian process model, new combinations of hyperparameters are selected within the defined hyperparameter search space and the next round of training and evaluation is performed. This process will be iterated, each time selecting the most likely combination of parameters to improve performance under the current estimation. After 50 iterations the optimisation Bayesian optimisation algorithm will gradually converge to a combination of hyperparameters with the best performance. In this case, the learning rate, maximum depth of the tree, minimum number of samples at leaf nodes and number of leaf nodes are optimised to get the best values which are 0.0768, 18.1271, 7.4061 and 22.558 respectively. Thus we get the combination of hyperparameters for LightGBM model.

The scores of individual features for feature visualisation processed by Bayesian optimised LightGBM in Figure 10 below. In the figure it can be seen that the top 20% of features are selected as feature covariates as feature parameters for the subsequent inputs and the feature scores are 177.7, 158, 143, 118.9, 87.1 and 76. The data filtered from these are used as inputs to the subsequent model to obtain the fan dataset.

Figure 10.

Bayesian optimisation of LightGBM feature visualisation operations.

Combining the above two modelling algorithms, the feature parameter data related to the temperature rise of high-speed bearings of wind turbine gearboxes are retained to the maximum extent, which makes the dimensionality of the subsequent input parameters reduced and improves the training accuracy and efficiency.

Forecast analysis

Seventy percent of the selected feature data is input as a training set in the Bayesian-PSO-SVM model for training and the remaining 30% of the data is used as a test set for model performance prediction testing. The predicted values are obtained in the data are input into the Bayesian-PSO-SVM model respectively. In this experiment, the hyperparameters of the Bayesian algorithm to optimise the PSO are parameterised as the following: the learning factors $c_{1}$ and $c_{2}$ are 1.3 and 1.17, the maximum number of iterations maxgen is 60, the maximum number of populations sizepop is 149, and the range of the penalty factor $C$ , and the kernel function parameter g in the SVM algorithm are $C \in$ [0.001,100], $g \in$ [0.001,1000]. Figure 11 represents the best fitness curve of the Bayesian optimised PSO, from the figure it can be seen that the optimised PSO is close to the best fitness in the 32nd generation converges faster to find the best fitness in a shorter period of time, at the same time, the curve declines and then tends to stabilise indicating that the algorithm has converged and stabilised. In the SVM model, the penalty parameter c and the kernel function parameter g are selected as hyperparameters to be optimised by the PSO algorithm. These hyperparameters are iteratively updated in the PSO algorithm to update the position and velocity of the particles, where the position of the particle represents the values of c and g in the SVM, by calculating the fitness of each particle until it reaches to satisfy the stopping condition (either meets the condition of convergence or reaches the number of iterations) thus finding the condition satisfied with c of 7.61 and g of 0.1.

Figure 11.

Bayesian-PSO-SVM model optimal fitness.

In this paper, mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE) and coefficient of determination ( $R^{2}$ ) were selected as the evaluation indexes of this prediction effect.

Where MAE represents the mean of the absolute value of the error between the predicted value and the true value, which reflects the real situation of the prediction error; MAPE is the size of the mean of the absolute value of the relative error between the predicted value and the true value, and MAPE responds to the relative size of the prediction error and the true value; RMSE is the square root of the ratio between the sum of squares of errors between predicted and true values and the number of samples, which is the ratio of the error The RMSE is the square root of the ratio between the sum of the squares of the errors between the predicted and true values and the number of samples, which is the degree of dispersion of the error distribution.

Then $R^{2}$ is the difference between 1 minus the ratio of the sum of squares of the residuals ( $X^{2}$ ) and the total sum of squares, so $R^{2}$ is a reflection of the algorithm’s goodness of fit, and the closer $R^{2}$ is to 1, the better the algorithm’s fit is.

The following equation shows the formula for MAE, RMSE, MAPE, and $R^{2}$ :

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(16)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(17)

MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} | \times 100 %

(18)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{n} {({\bar{y}}_{i} - y_{i})}^{2}}

(19)

where: $n$ is the number of samples, $y_{i}$ represents the true value, ${\hat{y}}_{i}$ is the predicted value, and ${\bar{y}}_{i}$ represents the average of the true values.

As can be seen in Figures 12 and 13, the comparison of the predicted and actual values of the training and prediction sets of the Bayesian-PSO-SVM model. The training prediction error index of the Bayesian-PSO-SVM model is stable at [−2,0.5], and the prediction set error is stable at [−0.8,0.8], and the prediction error is randomly distributed around the mean, which indicates that this model has better stability.

Figure 12.

Comparison of predicted and actual values of training set of Bayesian-PSO-SVM model.

Figure 13.

Comparison of predicted and actual values of testing set of Bayesian-PSO-SVM model.

The Bayesian-PSO-SVM model prediction indexes are shown in Table 3, the average absolute error MAE of both the training set and the test set are small, around 0.5; the root mean square error RMSE increases, and the prediction results have a certain degree of dispersion, around 0.7; the average relative error MAPE value is around 1.6%, and the $R^{2}$ of the training set and the test set are both close to 1. From a general point of view, the Bayesian-PSO-SVM model established in this paper is more suitable for predicting the temperature rise of high-speed bearings of wind turbine gearboxes on both the training set and the test set.

Table 3.

Bayesian-PSO-SVM model prediction results.

Evaluation indicators	MAE	RMSE	MAPE (%)	$R^{2}$
Training set	0.544	0.821	1.88	0.994
Testing set	0.452	0.695	1.46	0.996

In order to verify the necessity of feature preprocessing for data prediction, the original data is taken as input data, which is fed into the Bayesian-PSO-SVM regression prediction model and compared with the method of the experimental data obtained after performing feature preprocessing analysis. The evaluation metrics are shown in Table 4 below. The test set prediction plot of Bayesian-PSO-SVM model for unprocessed data is shown in Figure 14.

Table 4.

Experimental results of each prediction algorithm with optimised.

Data	Evaluation indicator	MAE	RMSE	MAPE	$R^{2}$	Times (s)
Data preprocessing	Random forest	1.645	2.421	5.862%	0.941	2.151
	BP neural network	1.161	2.006	29.09%	0.954	1.325
	SVM	0.969	1.828	3.56%	0.979	0.384
Original data	PSO-SVM	0.956	1.347	3.06%	0.981	10.149
	Bayesian-PSO-SVM	0.544	0.821	1.88%	0.994	19.520
	Bayesian-PSO-SVM	0.663	0.978	2.58%	0.99	57.124

Figure 14.

Bayesian-PSO-SVM model testing set prediction plots for unprocessed data.

From Figures 13 and 14, it can be seen that the unprocessed data entered into the model as eigenvector parameters, the predicted values deviate from the actual values to a greater extent relative to the predicted values of the model with data preprocessing, reflecting the necessity of preprocessing the data from the WTGs, with a more dispersed and wider distribution of errors.

And in order to verify the effectiveness of Bayesian-PSO-SVM model in predicting the temperature rise of high-speed bearings of gearboxes in wind farms, the Random Forest model and BP neural network model as well as the SVM model were applied for prediction, respectively, and at the same time, the PSO-SVM model was added to compare with the Bayesian-PSO-SVM model for comparison.

By the fitting effect in the Figures 15 –17 can be found, when the high-speed bearing temperature rise changes, in the BP neural network, random forest algorithm and SVM model three model comparison reaction compared to the BP neural network and random forest algorithm, SVM model in the temperature rise at the same time as large fluctuations occur, can be as soon as possible to react and timely prediction, and the remaining two algorithms and can not reflect the degree of change occurs in a timely manner, resulting in the the prediction results are not satisfactory enough. At the same time the error value in kept between [−4,6], keep more accurate. It can be seen that among BP Neural Network, Random Forest Algorithm and SVM model, SVM model has better prediction accuracy and robustness, so SVM is chosen as the prediction algorithm model.

Figure 15.

Plot of SVM algorithm predicted values against actual values.

Figure 16.

Comparison between predicted and actual values of BP neural network.

Figure 17.

Plot of predicted versus actual values of the Random Forest model.

From Figures 15 and 18 comparison can be seen PSO-SVM model and ordinary SVM model comparison, PSO-SVM model can better reflect the trend of changes in the original data, in the ups and downs of the change in the tiny interval can be more timely follow up the degree of change in the original value, more close to the actual value of the indicator curve, while the SVM model will be in the small changes can not be timely follow up, produce error changes more.

Figure 18.

Plot of PSO-SVM model predicted values versus actual values.

Firstly, the Bayesian algorithm can provide a smarter parameter search pattern, which can adjust the parameter space range based on the previous search results and thus converge faster to search for the optimal solution. This helps to improve the performance of the model. Secondly, the PSO algorithm has the characteristic of global optimality, which is more favourable to find the optimal solution, and together with the Bayesian algorithm, it can avoid falling into the local optimal solution. Finally, Bayesian-PSO-SVM combines the advantages of Bayesian model and Particle Swarm Optimisation with stronger adaptive and iterative capabilities. It is able to adjust the search direction in real-time results, thus optimising the model more flexibly and improving the prediction accuracy. Therefore, after feature engineering, the Bayesian-PSO-SVM model outperforms other algorithms.

The results of each error index obtained from the above experiments are shown in Table 4. From the above experimental results, it can be seen that the Bayesian-PSO-SVM model obtained without the pre-processing operation of feature engineering is not very good in the application of actual wind farms. When no feature engineering is performed, the Bayesian-PSO-SVM model obtains a MAPE of 2.58%, a MAE of 0.665, an RMSE of 0.978, an $R^{2}$ of 0.990, and an error distribution between [−7, 2.5]. The accuracy of the model is greatly improved by the use of feature engineering for data preprocessing, data screening and processing. The MAE value of the model proposed in this paper is 0.544, the RMSE is 0.821, the MAPE is 1.88%, and the $R^{2}$ is improved to 0.994. therefore, it proves the necessity of feature engineering for the application of the model. While after feature engineering SVM model, PSO-SVM model and Bayesian-PSO-SVM model get MAE, RMSE, MAPE get decreased to different degrees and $R^{2}$ gets improved to some extent which reflects the improvement of SVM model by intelligent algorithms. As can be seen from Table 4, in the case that each algorithm is run 10 times and the input data set is small-scale data respectively, among the BP neural network, random forest model and SVM model, the SVM algorithm has a time of 0.384 s and a coefficient of determination of 0.979, which is the fastest among the three algorithms and gives better prediction results. After the optimisation of particle swarm algorithm, the decision coefficient of SVM algorithm is better than that of SVM algorithm, but the running time also reaches 10.149 s, while the running time of the model proposed in this paper is 19.520 s, which is higher than that of PSO-SVM algorithm, but the accuracy of the algorithm is higher than that of PSO-SVM model. The running time of the Bayesian-PSO-SVM model without data preprocessing is 57.124 s, which shows that the more complex the dataset is, the algorithm requires a longer running time. Therefore, this paper concludes that the Bayesian-PSO-SVM algorithm after Bayesian-PSO-SVM algorithm is the best comprehensive performance among the above algorithms.

As can be seen from Figure 19, in the PSO-SVM model, its fitness value does not reach convergence although it decreases rapidly in the third generation, and gradually reaches the convergence condition in the 36th generation, and at the same time, the curve decreases after a long period of oscillation and a slow and gradual approximation of the convergence condition. Comparing with the model proposed in this paper, it can be seen that the Bayesian-PSO-SVM model utilises a shorter time to reach convergence and its computational efficiency is faster.

Figure 19.

Bayesian-PSO-SVM model versus PSO-SVM model adaptation.

We have analysed the ANOVA of the predicted values of the six models mentioned.

Step 1: Establish the hypotheses (including the null hypothesis (all sample groups have equal means) and the opposing hypothesis (at least one group of sample means is different from the others)), where the null hypothesis is H0 and the opposing hypothesis is H1;

Step 2: Calculate the total sum of squares (the sum of the squares of the differences between all observations and the total mean), let it be SST;

Step 3: Calculate the intergroup sum of squares (the sum of the squares of the means of each group and the total mean difference), let it be SSB;

Step 4: Calculate the within-group sum of squares (the sum of the squares of the differences between the mean values of each observer in the group and the mean value of the group), let it be SSW;

Step 5: Calculate the mean squares (MS): hereby, Mean square error between groups(MSB):

MSB = \frac{SSB}{k - 1}

(20)

where, k is the number of groups.

Within-group mean squared error(MSW):

MSW = \frac{SSW}{N - k}

(21)

where, N is the total number of observations.

Step 6: Calculate the F:

F = \frac{MSB}{MSW}

(22)

which is used to determine the level of significance, usually using the significance level $α$ (e.g. 0.05 or 0.01). Draw conclusions: If the F is greater than the critical F-value, reject the null hypothesis that at least one group’s mean is different from the others. If the F is less than or equal to the critical F-value, the null hypothesis cannot be rejected and all groups are considered to have equal means.

After ANOVA analysis, the p-value obtained was 0.770, for which a hypothesis test was carried out, in this case the p-value was greater than the level of significance (0.770 > 0.05) and therefore there was not enough evidence to reject the original hypothesis that there was no significant difference in the means between the groups. The model is statistically consistent between the groups, that is, there is no significant difference in the means between the groups.

The above results fully demonstrate the necessity of data screening using sparse stochastic projection feature dimensionality reduction data matrix downscaling and Bayesian-LightGBM model, the superiority of Bayesian-PSO-SVM in regression model prediction and the effectiveness of the prediction accuracy of high-speed bearing temperature rise of the gearboxes in real-world problems of WTGs.

Conclusions

In this paper, on the basis of previous research, in order to solve the problem that a single algorithm cannot accurately complete the prediction of high-speed bearing temperature rise of wind turbine gearboxes, a combined modelling method based on wind turbine operation data is proposed, which organically combines the sparse stochastic projection matrix model, Bayesian-LightGBM model and Bayesian-PSO-SVM model, to achieve the prediction of the temperature rise of high-speed bearings of wind turbine gearboxes. The prediction of the temperature rise of high-speed bearing of wind turbine gearbox is achieved. The following conclusions are mainly obtained.

(1) The preliminary dimensionality reduction dataset is obtained by feature dimensionality reduction of the original data through sparse random projection matrix, and then, the Bayesian-LightGBM model is used to select and screen the features of the preliminary dimensionality reduction dataset, and the irrelevant features to the prediction results are further deleted, which achieves the reduction of the model complexity, reduces the running time, and improves the accuracy of the prediction of the subsequent model.

(2) A Bayesian-PSO-SVM prediction model was established, and the coefficient of determination $R^{2}$ was 0.994, which was a suitable fit for the prediction model; the average relative error of the test set was 2.578%, which was a small error.

(3) Several different algorithms are used to verify the prediction regression, and the Bayesian-PSO-SVM prediction model is better than the rest of the prediction models, with an RMSE of 0.821, which is significantly better than the traditional BP neural network algorithm and Random Forest algorithm, providing a new method for the prediction of temperature rise of high-speed bearings of gearboxes of wind turbines.

(4) By comparing the prediction results in the Bayesian-PSO-SVM model with the original data and the data after data preprocessing, the accuracy of the prediction model after data preprocessing is improved, the MAE, RMSE, and MAPE are decreased to different degrees, and the $R^{2}$ is improved to a certain degree. This further proves the necessity of data preprocessing when performing data prediction.

The next step of the work plan is to increase the data screening and supplementation of feature engineering in the fault diagnosis of high-speed bearing temperature rise of 5 MW WTG based on Bayesian-LightGBM and improved PSO-SVM model, so as to improve the accuracy of fault diagnosis as much as possible. And although the model proposed in this paper has achieved a certain degree of accuracy, with the development of wind power technology and deep learning technology, the model still needs to be further improved by updating the algorithms and technology to further improve the model accuracy and prediction efficiency in order to meet the real needs. Therefore, the system needs to be optimised and improved in the future.

Supplemental Material

sj-png-1-mac-10.1177_00202940241280051 – Supplemental material for Temperature rise of high-speed bearing in gearbox of 5 MW wind turbine based on Bayesian-LightGBM and improved PSO-SVM troubleshooting

Supplemental material, sj-png-1-mac-10.1177_00202940241280051 for Temperature rise of high-speed bearing in gearbox of 5 MW wind turbine based on Bayesian-LightGBM and improved PSO-SVM troubleshooting by Minan Tang, Zhanglong Tao, Jiandong Qiu, Jinping Li, Mingyu Wang, Hongjie Wang and Chuntao Rao in Measurement and Control

Footnotes

Acknowledgements

The authors gratefully acknowledge the support they received from the National Natural Science Foundation of China (grant numbers 62363022, 61663021, 71763025 and 61861025); Natural Science Foundation of Gansu Province (grant number 23JRRA886); Gansu Provincial Department of Education: Industrial Support Plan Project (grant number 2023CYZC-35).

Author Contributions

Conceptualization, M.T.; data curation, Z.T.; formal analysis, Z.T., J.L. and M.W.; funding acquisition, M.T.; investigation, Z.T., J.Q., J.L., H.W. and C.R.; methodology, M.T., Z.T. and J.Q.; project administration, M.T.; resources, M.T., M.W. and C.R.; software, Z.T.; supervision, M.T. and J.Q.; validation, J.Q. and H.W.; visualization, Z.T. and J.Q.; writing and original draft preparation, M.T. and Z.T.; writingand review and editing, M.T. and J.Q. All authors have read and agreed to the published version of the manuscript.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the National Natural Science Foundation of China (grant numbers 62363022, 61663021, 71763025 and 61861025); Natural Science Foundation of Gansu Province (grant number 23JRRA886); Gansu Provincial Department of Education: Industrial Support Plan Project (grant number 2023CYZC-35).

ORCID iD

Zhanglong Tao

Data availability statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

Min

Exploration of the current situation and development trend of wind turbines. China Mach 2024; 03(4): 58–61.

Liu

J-H

Corbita

NT.

Performance analysis of different predictive models for condition monitoring of direct drive wind turbine generator. Meas Control 2021; 54(3-4): 374–384.

Xiaohang

A review of wind turbine fault diagnosis and prediction technology research. J Instrum 2017; 05: 1041–1053.

Wind Energy Professional Committee of China Renewable Energy Society. Overview of the 2012 national wind power equipment operation quality survey. Wind Energy, 2013, 09, 36–44.

, et al. Fracture and electrochemical behavior of S355 steel welded joint under high-frequency vibration. Eng Fract Mech 2024; 306: 306.

Hou

A multi-scale feature fusion network-based fault diagnosis method for wind turbine bearings. Wind Eng 2023; 47(1): 3–15.

Haoyong

Songtao

Cost composition and price mechanism of offshore wind power. Wind Energy 2022; 1: 12–15.

Chen

Liu

Zhao

, et al. Autonomous port management based AGV path planning and optimization via an ensemble reinforcement learning framework. Ocean Coast Manag 2024; 251: 107087.

Chen

Shang

W-L

, et al. Ship energy consumption analysis and carbon emission exploitation via spatial-temporal maritime data. Appl Energy 2024; 360: 122886

10.

Chen

Han

, et al. Orientation-aware ship detection via a rotation feature decoupling supported deep learning approach. Eng Appl Artif Intell 2023; 125: 125.

11.

Abd-Elwahab

Hassan

AA.

SCADA data as a powerful tool for early fault detection in wind turbine gearboxes. Wind Eng 2021; 45(5): 1317–1326.

12.

Jin

Qiao

Condition monitoring of wind turbine generators using SCADA data analysis. IEEE Trans Sustain Energy 2021; 12(1): 202–210.

13.

Xiao

Shu

, et al. Technical and economic analysis of battery electric buses with different charging rates. Transp Res 2024; 132: 132.

14.

Hirth

Müller

System-friendly wind power: how advanced wind turbine design can increase the economic value of electricity generated through wind power. Energy Econ 2016; 56: 51–63.

15.

Habibi

Howard

Simani

Reliability improvement of wind turbine power generation using model-based fault detection and fault tolerant control: a review. Renew Energy 2019; 135: 877–896.

16.

Feiyue

Research on rolling bearing fault feature extraction and diagnosis method. Beijing: North China Electric Power University, 2017.

17.

Rezamand

Kordestani

Carriveau

, et al. A new hybrid fault detection method for wind turbine blades using recursive PCA and wavelet-based pdf. IEEE Sens J 2020; 20(4): 2023–2033.

18.

Dhiman

Deb

Carroll

, et al. Wind turbine gearbox condition monitoring based on class of support vector regression models and residual analysis. Sensors 2020; 20(23): 6742.

19.

Dong

Zhang

, et al. Research on a novel fault diagnosis method for gearbox based on matrix distance feature. Meas Control 2024; 57(4): 454–468.

20.

Yan

, et al. Fault diagnosis of wind turbine based on PCA and GSA-SVM. In: 2019 prognostics and system health management conference (PHM-Paris), 2019, pp.13–17. New York: IEEE.

21.

Tao

Siqi

Zhang

, et al. Abnormal detection of wind turbine based on SCADA data mining. Math Probl Eng 2019; 2019(1): 1–10.

22.

Zidong

Research on fault diagnosis and fault early warning of wind turbine based on operation data analysis. Jinan: Shandong University, 2020.

23.

Lin

Zhang

Ruiqi

, et al. Wind turbine performance evaluation based on improved fruit fly algorithm optimised support vector machine. Renew Energy 2019; 37(15): 132–137.

24.

Chen

Dou

Song

, et al. Spatial-temporal ship pollution distribution exploitation and harbor environmental impact analysis via large-scale AIS data. J Mar Sci Eng 2024; 12(6): 960.

25.

Bing

Rui

Yigang

, et al. Application of improved random forest algorithm in motor bearing fault diagnosis. Chin J Electr Eng 2020; 40: 1310–1319.

26.

Jiaman

Yehui

Qingbo

, et al. Deep forest-based bearing fault diagnosis method. Vib Shock 2021; 40(1): 107–113.

27.

Meng

You

Wei

, et al. Early warning of power station wind turbine faults based on LSTM neural network and Bayesian optimisation. Therm Power Eng 2022; 37: 213–220.

28.

Ping

Peng

Hongmin

A novel gear fault diagnosis method based on radial basis neural network. Control Decis Mak 2022; 37: 409–416.

29.

Ying

Wei

Xianyong

Clustering and analysis of massive users’ electricity consumption behaviour based on denoising self-encoder network feature dimensionality reduction and improved small batch optimization K-mean algorithm. Power Autom Equip 2022; 42: 146–153.

30.

Kai

Kexin

Subspace clustering of weighted Gaussian mixture models with structural-entropy. J Electron 2022; 50(5): 718–725.

31.

Dongfeng

Jiayu

, et al. Research on density peak clustering method for wind speed prediction of wind farms. J Sol Energy 2021; 42: 110–118.

32.

Tan

, et al. Estimation method of line loss rate in low voltage area based on mean shift clustering and BP neural network. J Phys Conf Ser 2021; 1754(1): 012225.

33.

Quinlan

. Induction of decision trees. Mach Learn 1986; 1(1): 81–106.

34.

Shaoli

Research on wind turbine blade icing fault diagnosis model based on machine learning. Xi’an: Xi’an University of Technology, 2024.

35.

Meng

Finley

, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, 2017, p.30.

36.

Yaru

Yulai

Jiachen

A review of Bayesian optimisation methods for hyperparameter estimation. Comput Sci 2022; 49(3): 86–92.

37.

Jiaxu

A review of Bayesian optimisation methods and applications. J Softw 2018; 29(6): 3068–3090.

38.

Bai

Xiaoning

Zhuo

, et al. A spatial load forecasting method using fuzzy information granulation and support vector machine. Grid Technol 2021; 45: 251–260.

39.

Osuna

Freund

Girosit

Training support vector machines: an application to face detection. In: Proceedings of IEEE Computer Society conference on computer vision and pattern recognition, 1997, pp.130–136. New York: IEEE.

40.

Yao

Tham

Dai

FC.

Landslide susceptibility mapping based on support vector machine: a case study on natural slopes of Hong Kong, China

Geomorphology 2008; 101(4): 572–582.

41.

Dai

, et al. GIS-based support vector machine modeling of earthquake-triggered landslide susceptibility in the Jianjiang River watershed, China. Geomorphology 2012; 145-146: 70–80.

42.

Xiang

Yuan

, et al. Rotor position self-sensing of SRM using PSO-RVM. Energies 2018; 11(1): 66.

43.

Fengyuan

Hui

, et al. Feature extraction of local discharge signal based on multi-resolution high-order singular spectrum entropy analysis. Grid Technol 2016; 40(7): 3265–3271.

44.

Huang

Rongzhen

Research on CKLPMDP algorithm for rotor fault dataset dimensionality reduction. Vib Shock 2021; 40(1): 37–42.

45.

Zhang

Research on hierarchical pedestrian detection based on SVM classifier with improved kernel function. Meas Control 2022; 55(9-10): 1088–1096.

46.

Jáčová

Gardlo

Dimandja

, et al. Impact of sample dimensionality on orthogonality metrics in comprehensive two-dimensional separations. Anal Chim Acta 2019; 1064: 138–149.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.11 MB