Sage Journals: Discover world-class research

Abstract

Power transformers are crucial components of power transmission and transformation networks. Their operational status has a direct impact on the reliability of power supply systems. As such, the security and stability of power systems depend heavily on the state of transformers within them. The oil temperature of a transformer is a critical indicator of its working condition. Accurately and rapidly predicting transformer oil temperature is therefore of significant practical importance for ensuring the safe and effective operation of power systems. To address this prediction problem, this article proposes a transformer oil temperature prediction method based on empirical mode decomposition-bidirectional long short-term memory (EMD-BiLSTM). The time series of oil temperature is first cleaned before being processed. Next, the EMD algorithm is used to decompose the time series into relatively stable components. The BiLSTM neural network is then utilized to predict the complex nonlinear long-term series. The proposed method is evaluated using the open data set Electricity Transformer Temperature (ETT)-small. Experimental results show that the EMD-BiLSTM model outperforms traditional LSTM, BiLSTM, EMD-BP, and Wavelet Transform-Bidirectional Long Short-Term Memory (WT-BiLSTM) methods, demonstrating that it is an effective and accurate prediction method for transformer oil temperature.

Keywords

Power transformer working state EMD-BiLSTM Oil temperature prediction

Introduction

The power transformer is a crucial component in the power transmission system,^1,2 and its proper functioning is vital for the reliability of the power supply.³ Monitoring and detecting abnormal working conditions of transformers poses a significant challenge. Most commonly used transformers are oil-immersed, with the oil performing functions such as insulation, heat dissipation, and arc-fencing. It also prevents moisture from entering the components, thus serving as a safeguard. Additionally, the oil acts as a coolant and can eliminate fire arcs generated by high temperatures. Thus, the oil plays a significant role in the transformer's operation, and its temperature provides a direct indication of the transformer's working state.

The temperature of the transformer oil is a crucial indicator of the transformer's operational state. The operation and equipment safety of the transformer depend greatly on the effective and fast forecast of the transformer oil temperature. The oil temperature is often created as a time series, thus the forecast of the oil temperature is usually based on the prediction methods of the time series. For instance, Yu Xi et al.⁴ utilized support vector regression machine (SVM) with a three-phase power load and particle swarm optimization (PSO) for model parameter optimization. Fei Xiao et al.⁵ proposed an intelligent forecasting approach utilizing machine learning techniques such as a decision forests algorithm with a dynamic association learning model. Wei Rao et al.⁶ developed a Bayesian network and association rules-based transformer oil temperature forecasting approach, which improved the predictive power of RBF-NN for forecasting transformer oil temperature. A new model based on fundamental heat transport theory was suggested by Ali Taheri and others.⁷ The model uses an electro-thermal resistance model to estimate the thermal behavior of the top oil of interior distribution transformers (E-TRM). By assigning thermal resistance to each three-dimensional heat transfer channel, the thermal resistance network is constructed.

Time series prediction analysis is a kind of complex prediction modeling method. It predicts the data information of the event in the future from the event data generated in the past. The time series models rely on the sequence of events. If the order of values of the time series changes, the results of the output of model are various.⁷ Traditional prediction methods such as moving average method and exponential average method are used for time series analysis. However, deep learning methods, particularly deep neural networks (DNN), are increasingly employed in various fields, including time series prediction tasks. These DNN models include recurrent neural network (RNN), long short-term memory (LSTM),⁸ gated recurrent unit (GRU),⁹ stacked LSTM, stacked GRU, bidirectional LSTM, bidirectional GRU, and CNN-LSTM.¹⁰ While RNN has the ability to “remember” previous data, it suffers from the problem of gradient dissipation during information feedback. LSTM was introduced to address this issue by incorporating historical state, current memory, and current input control unit to model long sequences. The LSTM model replaces traditional hidden nodes with memory cells to prevent gradient disappearance or explosion during long-time training. BiLSTM combines the advantages of both forward and backward information to improve prediction accuracy and avoid gradient issues caused by time dependence.^8,10 In this article, we propose the use of EMD-BiLSTM to establish a prediction model for transformer oil temperature.^11–13 The key contributions of this article include: a transformer oil temperature prediction method is proposed that combines the empirical mode decomposition (EMD) algorithm with the BiLSTM neural network; The proposed EMD-BiLSTM model is tested on the ETT-small dataset and results show that it outperforms traditional LSTM and BiLSTM models in terms of accuracy; the findings of this research suggest that the proposed EMD-BiLSTM model is an effective and accurate method for predicting transformer oil temperature, which has important implications for the operation and safety of power systems.

EMD-BiLSTM-based time series prediction models

Empirical mode decomposition

Specifically designed for the processing of nonlinear and nonstationary time series, the EMD¹¹ algorithm is a novel time-frequency analysis approach in signal processing. The EMD method, in contrast to the Fourier and Wavelet transforms, may break down and analyze signals in accordance with its own time scale features without any prior information. EMD is regarded as a significant improvement over conventional time-frequency analysis techniques based on linear and stationary assumptions, such as Fourier analysis and Wavelet transform.^12,13 The time series may be broken down using the EMD technique into finite intrinsic mode functions (IMFs) and a trend term (residual). The deconstructed IMF components depict the original time series’ fluctuation data on several time scales. Determining all extreme points for a particular time series is the first step in the EMD decomposition process. Next, the maximum and minimum points are interpolated and fitted to produce the upper and lower envelopes, x_max(t) and x_min(t), respectively. The time series has a mean value of m(t), and the average value may be taken out of the initial time series. h(t) = x (t)−m(t) is the new time series that is created as a result h(t). The number of the signal's zero points and extreme points must match or differ by no more than one for h(t) to be an IMF component. The second requirement is that the signal has zero mean. h(t) = x (t)−m(t) iterations will be performed until the two constraints are satisfied. The aforementioned screening processes are continued until a monotonic sequence, also known as a constant sequence R_n, is found. The removal of the IMF component from the original time series occurs each time an IMF component is discovered. The EMD decomposition expression of original time series is following

x (t) = \sum_{i = 1}^{n} IM F_{i} + R_{n}

(1)

The RNN model

RNN is one of the most common and strongest tools of the time-series prediction models.^14,15 The computing results of the common neural network are independent of each other, and the calculation results in each hidden layer of RNN are composed of the present input and the computation results of the previous hidden layer.¹⁶

As can be seen in Figure 1, the right part of the figure is the expanded structure to help understand memory. X is the input layer, O is the output layer, s is the hidden layer, and t is the amount of calculations. The weight is V, W, u, and the t-th hidden layer state is S_t = f(U*X_t + W*S_t₋₁). Thus, the current input is combined with the previous computation. RNN has some limitations. If the RNN model wants to achieve long-term memory, it is necessary to combine the calculation results of the present hidden state with the calculation results of the previous n times. Such as S_t = f(U*X_t + W₁*S_t−1 + W₂*S_t₋₂ + … + W_n*S_t−n) which leads to an exponential increase in the amount of calculation, and the training time of model will increase greatly. Therefore, we often directly use the RNN model for long-term memory calculations.¹⁶

Figure 1.

RNN network structure.

It is commonly known that the human brain does not always begin thinking from scratch. Rather, it has the ability to retain and recall past information. Traditional neural networks, however, lack this persistence. This is where RNNs come into play. By linking past knowledge to current processes, RNNs are able to emulate the brain's ability to make deductions based on previous experiences. For instance, while watching a video, our brain can comprehend the current frame by drawing on information from the previous frame. However, implementing RNNs requires many dependency factors, making the process more complex. Nonetheless, in certain cases, we only need information obtained prior to completing the current task.

Long short-term memory

A particular variant of RNN model is the LSTM. In order to efficiently learn the dynamic information of the input data sequence, LSTM projects the input to the hidden state and the hidden state to the output.^17,18 In the hidden layer of an RNN, there is only one state h that is responsive to short-term inputs. In contrast to the single RNN’s structure, the LSTM adds three control gates to learn long-term dependant information, which may be used to selectively change each time state in the RNN. The issue of RNN having just one hidden layer state h is resolved by LSTM by adding a state c on top of the RNN foundation in order to save the long-term state. Forgetting gate f_t, input gate i_t, output gate O_t, and one memory unit make up the LSTM unit structure. Figure 2 depicts the structure.

Figure 2.

The unit structure of LSTM.

In Figure 2, the yellow nodes represent “operating the elements one by one,” and there are two kinds: multiplication “Ä” getting the dot product between elements, multiplying point by point; and addition “Å” getting the sum. In other words, the two vectors of the same dimension are multiplied or added by the corresponding elements after being passed through the yellow operation box. The pink node represents “activation operation,” and also there are two kinds: namely σ function and tanh function. If the two lines are fused in the direction of the arrow, they are simply stacked as mentioned above; if one line is divided into two lines, they are reproduced into the same two copies. Where $x_{t}$ is the input data at moment t, $h_{t}$ is the output state value of the LSTM unit at moment t, $c_{t}$ is the candidate value of the memory unit at moment t, $i_{t}$ is the state value of input gate at moment t, $f_{t}$ is the state value of forgetting gate at moment t, $o_{t}$ is the state value of output gate at moment t, W is the corresponding weight, and b is the corresponding offset parameter. The state value of the memory unit is regulated by the input gate and the forgetting gate.

At moment t, the LSTM unit has three inputs, that is, the input of the network state at current moment $x_{t}$ , the input at last moment $h_{t - 1}$ , and the unit state at last moment $c_{t - 1}$ , two outputs, that is, the output of current moment $h_{t}$ and the unit state of current moment $c_{t}$ , where, x, c, and h are all vectors. The key issue of the LSTM unit is how to control the long-term state c. The three control gates of the LSTM unit are the input gate, the output gate, and the forgetting gate. The input gate controls and saves the long-term state c. The output gate controls that let the real-time state input into the long-term state c. The forgetting gate control whether let the long-term state c as the output of the LSTM unit at current moment. The control of the forgetting gate enables the LSTM unit to save the previous information. The control of input gate enables the LSTM unit to avoid the current irrelevant content entering into the memory. In fact, the three control gates are full-connected layers, whose inputs are vectors and whose outputs are real number vectors between 0 and 1.

Suppose W is the weight vector of the gate and b is the offset parameter. Then, the gate can be expressed as:

g (x) = σ (W x + b)

(2)

where

σ

is a sigmoid function, whose value range is (0, 1).

The contents of the unit state c are controlled by the input gate and the forgetting gate in the LSTM network. The input gate determines how much the input $x_{t}$ of the network at current moment is saved to the unit state $c_{t}$ , and the forgetting gate determines how much the unit state $c_{t - 1}$ at the previous moment is saved to the unit state $c_{t}$ at the current moment. While the output gate controls how much the unit state $c_{t}$ is output to the output $h_{t}$ at current moment.

The forgetting door determines how much of the cell state $c_{t - 1}$ information from the previous moment is retained in the current state $c_{t}$ ,

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(3)

where

W_{f}

is the weight matrix of the forgetting gate,

[h_{t - 1}, x_{t}]

connects two vectors into a longer vector,

b_{f}

is the offset parameter of the forgetting gate.

The input gate controls the input of new information,

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(4)

c_{t}^{'} = \tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})

(5)

Equation (5) is used to calculate for describing the unit state

c_{t}^{'}

of the input at current moment according to the output at previous moment and the input at current moment:

Then, get the unit state $c_{t}$ at the current moment t by:

c_{t} = f_{t} \cdot c_{t - 1} + i_{t} \cdot c_{t}^{'}

(6)

Equation (6) gets a new unit state

c_{t}

by adding two products: one is generated by multiplying the unit state

c_{t - 1}

at previous moment by the forgetting gate

f_{t}

, and another is multiplying the unit state

c_{t}^{'}

of the input at current moment by the input gate

i_{t}

element by element. So equation (6) combines the current memory

c_{t}^{'}

with the long-term memory

c_{t - 1}

The output gate is computed as,

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(7)

h_{t} = o_{t} \cdot \tanh (c_{t})

(8)

The output of the LSTM unit is determined by both the output gate

o_{t}

and the unit state:

y_{t} = W_{y} \cdot h_{t}

(9)

LSTM can be trained by using the back propagation algorithm. The main steps are as follows:

Calculating the output values ( $f_{t}$ , $i_{t}$ , $o_{t}$ , $c_{t}$ , $x_{t}$ , $h_{t}$ ) of each neuron in the forward direction.

Calculating the error terms of each neuron in the backward direction. There are two dimensions: one is the backpropagation along the time, that is, from the current time t, the error terms at each moment are calculated; the other is the propagation of the error term to the previous level.

According to the corresponding error term, the gradient of each weight is calculated.

The weight is updated by the algorithm of error backpropagation with gradient descent.

Bidirectional LSTM

BiLSTM is an extension of LSTM. It can improve the model performance of the sequence classification problems.^19–21 Among all the time step sizes available for the input sequence, BiLSTM trains two rather than one LSTM on the input sequence. The first in the input sequence is as-is, and the second is an inverted copy of the input sequence. It can provide additional context for the network and lead to faster, even fuller learning problems. The improved BiLSTM model with better generalization ability is proposed based on LSTM. To accelerate network convergence and improve the prediction results, the maximum pooling operator is introduced to avoid the problem of similarity between sampling environment information sequences. The structure of BiLSTM is shown in Figure 3.

Figure 3.

The structure of BiLSTM.

As shown in Figure 3, through the forward LSTM and the backward LSTM units, the forward and backward output are combined as the output of the hidden layer. Set the LSTM unit processing data in different directions in its hidden layer:

\begin{aligned} h_{f t} = & H (W_{x h_{f}} x_{t} + W_{h_{f} h_{f}} h_{f_{t - 1}} + b_{h_{f}}) \\ h_{b t} = & H (W_{x h_{b}} x_{t} + W_{h_{b} h_{b}} h_{b_{t - 1}} + b_{h_{b}}) \end{aligned}

(10)

where

h_{f}

and

h_{b}

are the hidden layers of the forward layer and the backward layer, respectively, H is the activation function,

W_{*}

and

b_{*}

are the weight and offset parameters of the model.

Then, the probability of the relationship between the occurrence of oil temperature and the previous oil temperature values in the input model is,

p (y | X) = σ (W_{h_{f^{z}}} h_{t} + W_{h_{b^{z}}} h_{t} + b_{z})

(11)

where

W_{h_{f^{z}}}

W_{h_{b^{z}}}

b_{z}

are the parameters to be trained and

σ

is the excitation function.

Oil temperature time series prediction based on EMD-BiLSTM

Direct prediction on the raw data is difficult to achieve the optimal effect for non-stationary time series data. We can apply the combination algorithm of EMD and BiLSTM to time series prediction. EMD is a data-driven signal analysis technique that decomposes a signal into a set of IMFs, which better capture the underlying characteristics of the signal compared to traditional Fourier transforms. EMD algorithm is used to separate the time series data.¹¹ The individual component data and original data are predicted, modeled, and reconstructed, respectively.¹² EMD is to separate complex signals into limited IMFs. After decomposition, each IMF component contains the part characteristic signals of the primary signal in different time scales, and noise reduction is realized.¹³ The BiLSTM network is a type of deep learning network that is designed to capture both past and future dependencies in a time series. It consists of two LSTM networks, one processing the input sequence in forward direction and another processing the same sequence in reverse direction, and the final output is the concatenation of their outputs. This allows the BiLSTM network to capture both the past and future dependencies in the time series, which is important for many time series prediction tasks.

EMD decomposition

EMD method can convert non-stationary nonlinear data into stationary linear ones, which plays a great auxiliary role in fully mining the hidden time series relationship of the temperature forecasting model. The steps are as described in the Empirical mode decomposition section.

Combining EMD-BiLSTM algorithm

The proposed EMD-BiLSTM model is a combined model that leverages the strengths of both the EMD and the BiLSTM network to improve the accuracy of time series predictions. The EMD-BiLSTM model can be used for a variety of time series prediction tasks, including weather forecasting, energy consumption prediction, and stock price prediction, among others. The EMD-BiLSTM model could be used to predict the temperature of the oil in power transformers. The oil temperature time series data is separated into several IMF parts and a residual part by EMD algorithm, and then the component data is transformed into multidimensional input matrix. The BiLSTM network is used to model and predict the component data, respectively. Finally, the prediction results of every component are reconstructed. The temperature prediction results are reached, and the EMD-LSTM-based forecasting process is illustrated in Figure 4.

Figure 4.

Prediction process diagram of EMD-BiLSTM algorithm.

Experiment

Data set

The data set adopts the ETT small data set in the power transformer data set,²² which has been published at https://github.com/zhouhaoyi/ETDataset. The ETT-small data set contains data from two power transformers for 2 years. Each data point marked with m is recorded every minute from two regions in the same province of China, named ETT-small-m1 and ETT-small-m2. In addition, we use h to mark the dataset variants with 1 hour granularity, namely ETT-small-h1 and ETT-small-h2. Each data point has eight-dimensional characteristics, including the recording date of the data point, the predicted value “oil temperature,” and six different kinds of external load values, which are HUFL, HULL, MUFL, MULL, LUFL, and LULL. The missing values in the data have been pre-processed with the cleaning method. The data set is very important for the research of the time series. The data is real and effective, and the amount of data is also very large. The ETT-small-h1 and ETT-small-m1 data set was used in this experiment, whose sample data are shown in Tables 1 and 2.

Table 1.

Sample data points of data set ETT-small-h1.

Date	HUFL	HULL	MUFL	MULL	LUFL	LULL	OT
2016/7/1 0:00	5.83E + 00	2.01E + 00	1.60E + 00	4.62E-01	4.20E + 00	1.34E + 00	3.05E + 01
2016/7/1 1:00	5.69E + 00	2.08E + 00	1.49E + 00	4.26E-01	4.14E + 00	1.37E + 00	2.78E + 01
2016/7/1 2:00	5.16E + 00	1.74E + 00	1.28E + 00	3.55E-01	3.78E + 00	1.22E + 00	2.78E + 01
2016/7/1 3:00	5.09E + 00	1.94E + 00	1.28E + 00	3.91E-01	3.81E + 00	1.28E + 00	2.50E + 01

Table 2.

Sample data points of data set ETT-small-m1.

Date	HUFL	HULL	MUFL	MULL	LUFL	LULL	OT
2016/7/1 0:00	5.83E + 00	2.01E + 00	1.60E + 00	4.62E-01	4.20E + 00	1.34E + 00	3.05E + 01
2016/7/1 0:15	5.76E + 00	2.08E + 00	1.49E + 00	4.26E-01	4.26E + 00	1.40E + 00	3.05E + 01
2016/7/1 0:30	5.76E + 00	1.94E + 00	1.49E + 00	3.91E-01	4.23E + 00	1.31E + 00	3.00E + 01
2016/7/1 0:45	5.76E + 00	1.94E + 00	1.49E + 00	4.26E-01	4.23E + 00	1.31E + 00	2.70E + 01

Data prediction with EMD-BiLSTM model

Build BiLSTM model

The experimental environment is Pycharm. Python's keras library is called to build the BiLSTM network. The activation function is the tanh function by default. The mean square error (MSE) is used as the loss function. 80% of the data set is trained 100 times as a training set, and the rest is a test set for prediction. The iterative update algorithms used for BiLSTM include Adagrad algorithm, Adam algorithm, and RMSProp. According to prediction and fitting effect, RMSprop algorithm is used in this experiments.

Decomposition data using EMD

The time series of oil temperature in power tranformer can be regarded as nonlinear and nonstationary signals. Therefore, EMD is employed to extract mono-component and symmetric components from the nonlinear and nonstationary signals by a filtering process. The filtering process is to remove the lowest frequency information until only the highest frequency remains. The EMD results contain a series of IMFs and a residual. At ETT-small-h1 O_t variables are extracted from CSV data set. A total of 17,421 pieces of O_t variables are decomposed by EMD and the results are shown Figure 5.

Figure 5.

EMD components and residual.

From Figure 5, we can find that the time series finally are decomposed into 10 IMF components and 1 residual component. At the same time, it is demonstrated that all the 10 IMF components satisfy the following conditions: the number of extreme and the number of zero crossings is either equal or differ at most by one; at any point, the mean value of the envelope defined by the local maxima and the local minima is zero. For the residual component, it is a monotonic decreasing function which indicates no more IMF can be extracted.

Data prediction

The BiLSTM network is used to model and predict the component data respectively. Finally, the prediction results of every component are reconstructed. And then the RMSE, MAE, and MAPE were selected to evaluate the effectiveness of the proposed method. We assume that O_i and P_i are the observed (ground truth) and predicted values, respectively. These indicators can be formulated as follows.

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(P_{i} - Q_{i})}^{2}}

(12)

MAE = \frac{1}{n} \sum_{i = 1}^{n} | P_{i} - Q_{i} |

(13)

MAPE = \frac{1}{n} \sum_{i = 1}^{n} \frac{| P_{i} - Q_{i} |}{Q_{i}}

(14)

Tables 3 and 4 show the prediction accuracy comparison results, with LSTM, BiLSTM, EMD-BP, and WT-BiLSTM methods, on data set ETT-small-h1, ETT-small-m1, respectively. Wavelet transform is a time-frequency analysis technique that decomposes a signal into wavelets, which are localized in both time and frequency. This allows the wavelet transform to capture both the temporal and frequency characteristics of a signal, making it well-suited for signals that are non-stationary or have complex patterns. So the wavelet transform was also used to compare the experimental results. Daubechies8 is used in wavelet transform. Figures 6 and 7 are used to demonstrate the prediction results comparison on data set ETT-small-h1 and data set ETT-small-m1, respectively. It can be seen that The proposed EMD-BiLSTM-based prediction method has the best performance, which means that EMD sequence decomposition and BiLSTM model are very effective.

Figure 6.

Prediction results comparison on data set ETT-small-h1.

Figure 7.

Prediction results comparison on data set ETT-small-m1.

Table 3.

Prediction accuracy comparison results on data set ETT-small-h1.

	RMSE	MAE	MAPE (%)
LSTM	0.9819	0.7762	9.2173
BiLSTM	0.9803	0.7811	8.5075
EMD-BP	1.7281	1.4369	15.6526
WT-BiLSTM	1.0332	0.7933	8.2517
EMD-BiLSTM	0.7489	0.5453	5.9995

Table 4.

Prediction accuracy comparison results on data set ETT-small-m1.

	RMSE	MAE	MAPE (%)
LSTM	1.1311	0.9334	10.5225
BiLSTM	1.1461	0.9395	10.9040
EMD-BP	1.4052	1.0985	13.5921
WT-BiLSTM	1.9258	1.6941	20.1720
EMD-BiLSTM	1.0540	0.8065	9.1784

Conclusion

As a critical power transmission and transformation equipment, power transformers generate large volumes of time-series data on oil temperature during operation. Extracting useful information from this data can help to monitor the running state of power transformers in real-time, directly impacting the reliability of the power system and power supply. In this work, we propose a technique for forecasting transformer oil temperature based on EMD-BiLSTM. The method uses BiLSTM neural networks and EMD. The only inputs required for this model are time-series data on the temperature of the transformer oil, which is treated as signal data. EMD is used to break down the data and extract frequency and amplitude characteristics as an unsupervised feature learning technique. This approach improves the ability to anticipate short-term trends, particularly for abrupt shifts. In the supervised learning phase, BiLSTM is utilized. The monitored transformer data can be used to predict the transformer's functioning condition in the future, which has tremendous research value.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Yu Xi

References

Lin

Yan

, et al. Prediction method for power transformer running state based on LSTM_DBN network. Energies 2018; 11(7): 1880.

Kabir

Foggo

. Data driven predictive maintenance of distribution transformers. 2018: 312–316. https://doi.org/10.1109/CICED.2018.8592417.

Feng

Jiang

Nie

, et al. State prediction method of power transformer based on grey differential threshold. In: 2018 IEEE Conference on Electrical Insulation and Dielectric Phenomena (CEIDP), 2018, pp. 410–413. https://doi.org/10.1109/CEIDP.2018.8544756.

Lin

, et al. Oil temperature prediction of power transformers based on modified support vector regression machine. Int J Emerg Electr Power Syst 2022. https://doi.org/10.1515/ijeeps-2021-0443

Xiao

Yang

G-j

. Research on intelligent diagnosis method of oil temperature defect in distribution transformer based on machine learning. In: Proceedings of 25th International Conference on Electricity Distribution, Madrid, 3–6 June 2019, pp. 676–680.

Rao

Zhu

Pan

, et al. Bayesian network and association rules-based transformer oil temperature prediction. In: Journal of Physics: Conference Series, Vol. 1314, 3rd International Conference on Electrical, Mechanical and Computer Engineering, 9–11 August 2019, Guizhou, China.

Taheri

Abdali

Rabiee

. Indoor distribution transformers oil temperature prediction using new electro-thermal resistance model and normal cyclic overloading strategy: an experimental case study. IET Gener Transm Distrib 2020; 14: 5792–5803.

Hochreiter

Schmidhuber

. Long short-term memory. Neural Comput 1997; 9: 1735–1780.

Cho

, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014, arXiv:1412.3555

10.

Sainath

, et al. Convolutional, long short-term memory, fully connected deep neural networks. 2015: 4580-4584.

11.

Zhang

Liu

Zhao

, et al. Air quality predictions with a semi-supervised bidirectional LSTM neural network. Atmos Pollut Res 2021; 12: 328–339.

12.

Huiting

Yuan

Chen

. Short-term load forecasting using EMD-LSTM neural networks with a Xgboost algorithm for feature importance evaluation. Energies 2017; 10: 1168.

13.

Meng

Xie

Sun

. Short-term load forecasting using neural attention model based on EMD. Electr Eng 2022; 10(8): 1168.

14.

Dupond

. A thorough review on the current advance of neural network structures. Annu Rev Control 2019; 14: 200–230.

15.

Abiodun

Jantan

Omolara

, et al. State-of-the-art in artificial neural network applications: a survey. Heliyon 2018; 4: e00938. ISSN 2405-8440. PMC 6260436. PMID 30519653.

16.

Tealab

. Time series forecasting using artificial neural networks methodologies: a systematic review. Future Comput Inform J 2018; 3(2): 334–340. ISSN 2314-7288.

17.

Hochreiter

Schmidhuber

. LSTM can solve hard long time lag problems. Adv Neural Inf Process Syst 1996; 9.

18.

Schmidhuber

. The most cited neural networks all build on work done in my labs. AI Blog. IDSIA, Switzerland, Retrieved, 2022: 4–30.

19.

Siami-Namini

Tavakoli

Namin

. The performance of LSTM and BiLSTM in forecasting time series. In: 2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 3285–3292. DOI: 10.1109/BigData47090.2019.9005997.

20.

Zhang

Liu

Yao

, et al. HSipu 2 – A new human physical fitness action dataset for recognition and 3D reconstruction evaluation. 2021: 3166–3175. DOI: 10.1109/CVPRW53098.2021.00354.

21.

Schuster

Paliwal

. Bidirectional recurrent neural networks. IEEE Trans Signal Process 1997; 45(11): 2673–2681.

22.

Zhou

Zhang

Peng

, et al. Informer: beyond efficient transformer for long sequence time-series forecasting. 2020; 35(12): 11106-11115.

Power transformer oil temperature prediction based on empirical mode decomposition-bidirectional long short-term memory

Abstract

Keywords

Introduction

EMD-BiLSTM-based time series prediction models

Empirical mode decomposition

The RNN model

Long short-term memory

Bidirectional LSTM

Oil temperature time series prediction based on EMD-BiLSTM

EMD decomposition

Combining EMD-BiLSTM algorithm

Experiment

Data set

Data prediction with EMD-BiLSTM model

Build BiLSTM model

Decomposition data using EMD

Data prediction

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References