Engine remaining useful life prediction based on PSO optimized multi-layer long short-term memory and multi-source information fusion

Abstract

Engine as the core component of mechanical equipment, its operating state directly affects whether the equipment can operate normally. Predicting the engine remaining useful life (RUL) can monitor the health of the engine in real time and formulate a timely and reasonable maintenance plan. Aiming at the engine monitoring data with various and long time span, we propose a direct prediction method of engine RUL based on particle swarm optimization (PSO) optimized multi-layer Long Short-Term Memory (LSTM) in this paper. Firstly, the monitoring data that can well reflect the engine degradation trend is screened out, and the samples are constructed through a sliding time window. Then, a multi-layer LSTM model is constructed to mine the deep-seated features of the samples for predicting the engine RUL. Finally, the hyperparameters of the multi-layer LSTM model are optimized automatically by the PSO algorithm to optimize the performance of the model. The effectiveness of this method is verified by NASA data set. RMSE, MAE and the scoring function are used as evaluation indexes. RMSE and score of the prediction results are 12.35 and 284.1, respectively. It has higher prediction accuracy compared with traditional deep learning and machine learning methods.

Keywords

Engine multi-layer Long Short-Term Memory particle swarm optimization remaining useful life

Introduction

As the power source of mechanical equipment, the engine is one of the core components of the equipment. Its operating state and performance determine whether the equipment can operate stably and reliably. The engine operating conditions are complex and changeable, and the operating environment is relatively bad, resulting in a high failure rate of the engine. Once the engine failure occurs, it is easy to affect production and cause safety issues. Predicting the remaining useful life (RUL) of the engine can formulate a timely and reasonable maintenance and replacement plan to effectively solve problems existing in post-event or scheduled maintenance.

The RUL refers to the time that an equipment can operate normally after a period of normal operation.¹ Currently, there are three main types of RUL prediction modeling methods commonly used: model-based approaches,² data-driven approaches,³ and based on hybrid model.⁴ The model-based approaches need to establish a physical model according to the engine operating law and degradation process for prediction.^5,6 As the structure of the engine becomes more and more complex and the fault characteristics are diverse. Establishing accurately a physical model is difficult. However, the data-driven approaches do not need to rely on a lot of engineering principles and professional knowledge, and predicts the RUL by mining the degradation laws directly from the collected engine monitoring data. With the development of computer computing power and the collection of a large number of sensor data, the data-driven approaches have become the mainstream.

Khelif et al.⁷ used support vector regression (SVR) to establish a direct relation between health indicators and sensor values for RUL prediction, which reduced the steps of fitting equipment health status curve and establishing failure threshold. Shallow machine learning methods generally rely on expert experience and signal processing technology, extract features manually, and do not have an advantage in processing a large amount of historical monitoring data of equipment. Deep learning method has been widely used to process a large amount of monitoring data with its powerful nonlinear fitting and feature self-extraction capabilities and mining the potential features of the data as much as possible.^8,9 Sbarufatti et al.¹⁰ proposed the application of combination of sequential Monte-Carlo sampling and artificial neural networks (ANN) to the fatigue crack life prediction problem. Deutsch et al.¹¹ used Root Mean Square (RMS) to mine the signal features of sensor data and predicted the RUL of bearing through Restricted Boltzmann Machine (RBM). Babu et al.¹² proposed using one-dimensional convolution neural network (CNN) to process multi-dimensional sensor data to predict RUL. Ren et al.¹³ proposed a method of bearing RUL prediction based on deep CNN. The eigenvector was extracted through the spectrum-principal-energy-vector. The monitoring data of equipment is generally multi-dimensional time series, and there is a correlation between before and after. The above prediction models do not take into account the time dependence of monitoring data. By introducing hidden state concept and memorizing key information, recurrent neural network (RNN) is able to efficaciously mine the time features in data. Long Short-Term Memory (LSTM) is superior in processing long-time sequence data by adding cell memory unit structure in hidden layer.^14,15

Heimes¹⁶ utilized RNN to predict the RUL of the equipment, and estimated directly the RUL through the model without feature extraction. Yuan et al.¹⁷ applied three RNN variant models to predict RUL, and concluded that LSTM model had better performance by comparing the prediction results. According to the multi-dimensional monitoring information of the equipment, Malhotra et al.¹⁸ utilized Encoder-Decoder based on LSTM to construct HI for the RUL prediction. There are two methods for data-driven RUL prediction: indirect prediction and direct prediction.^19,20 Indirect prediction is according multi-dimensional sensor data to obtain one-dimensional HI curve, and performs RUL prediction based on the HI curve. However, direct prediction does not require obtaining the HI curve of sensor data, and directly extracts multi-dimensional fault features from multi-dimensional sensor data to predict RUL.

In this paper, we adopt a data-driven RUL direct prediction model, which maps the sensor data directly to the RUL of the engine. We propose a method to predict the engine RUL based on multi-layer LSTM network, and use the Particle Swarm Optimization (PSO) to adjust the hyperparameters of the multi-layer LSTM network automatically to improve the prediction accuracy.

The rest of this paper is organized as follows. Section II introduces the methods of data preprocessing and the basic theory of the RUL prediction method. Section III verifies the performance of the proposed method by turbofan engine degradation dataset. Conclusion is drawn in Section IV.

RUL prediction method

Multi-sensor data fusion method

The original sensor data collected from the sensor monitoring system consists of multivariate time series data from multiple sensors. Each engine corresponds to a number of data points obtained by sampling its multivariate sensor data over a period of time. The original sensor monitoring data of the engine is generally a multi-dimensional time series, and the sensor monitoring data of each engine is expressed as $X_{i} = [x_{1}, x_{2}, . . ., x_{t}, . . . x_{T}]$ . Where $T$ represents the maximum running time step of the device, $x_{t}$ is the n-dimensional sensor monitoring data at time $t$ , $i$ represents the current engine id, and engine data of all id constitute the whole data set.

We adopt the RUL direct prediction method based on the data-driven model.^7,21 The collected sensor data maps to the engine health status directly. The RUL of the equipment system is predicted by performing pattern matching on multi-dimensional data and extracting m-dimensional fault characteristics that characterize degradation.

The collected engine sensor data includes a variety of variables, but not all sensor variables can characterize the failure process of the engine well. If the features are less correlated to the engine degradation characteristics, it will lead to insufficient fitting and affect the model training Effect. In order to screen out the variables whether can characterize or affect the engine operating state well and further compress the data to decrease the input dimension of deep learning model, this paper adapts the Pearson product-moment correlation coefficient to calculate the correlation among each variable.²² Through correlation analysis, we select the valuable variables to determine the final variables used for model training.

The criteria for evaluating the correlation between sensor characteristics and time are defined as follows:

Corr = \frac{| \sum_{i = 1}^{n} (x_{i} - \bar{x}) (t_{i} - \bar{t}) |}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(t_{i} - \bar{t})}^{2}}}

(1)

Where $n$ represents the sample size, $x_{i}$ and $t_{i}$ represent the monitoring data and the corresponding time at time $i$ respectively, and $\bar{x}$ and $\bar{t}$ are the mean values of sensor data and time. Pearson product-moment correlation coefficient is used to estimate the correlation between variables. If the correlation coefficient between a variable and other variables is small, it means that the variable has greater randomness, and the variable should be discarded.

The variables that characterize the engine failure process not only show high correlation, but also should have a monotonous trend.^23,24 The monotonicity of variables indicates that the engine failure gradually increases until it fails. So the feature selection is realized by calculating the correlation and monotonicity between sensor data and engine degradation process. The collected sensor data will be affected by noise, so we use the improved monotonicity criterion to evaluate the monotonic trend performance of sensor features.

The monotonicity criteria for evaluating the monotonic trend performance of sensor characteristics are as follows:

Mon = | \frac{Num of Δ > 0}{T - 3} - \frac{Num of Δ < 0}{T - 3} |

(2)

Δ = \frac{x_{i + 3} + x_{i + 2}}{2} - \frac{x_{i + 1} + x_{i}}{2}

(3)

Where $T$ represents the maximum running time step of the device, $x_{i}$ is the data of each sample points indexed by $i$ .

Finally, the sensor data is selected through the linear combination composite criterion ( $Corr - Mon$ ) of $Corr$ and $Mon$ :

Corr - Mon = ω * mean (Corr) + (1 - ω) * mean (Mon)

(4)

Data preprocessing

Sensor data is inevitably affected by environmental noise and sensor errors during the collection and transmission process. In order to avoid the adverse impact of some wrong data on modeling, it is necessary to preprocess sensor data.

The different value ranges and units of the collected sensor variables lead to large numerical gaps, which will affect the training result of the deep learning model and reduce the prediction accuracy. Therefore, the data needs to be standardized to reduce the difference between the data. In this paper, z-score standardization is used to standardize collected sensor variables, so that the standard deviation of each sensor variable is 1, and the mean value is 0. The formula is as follows:

x^{*} = \frac{x - \bar{x}}{σ}

(5)

Where σ represents the original variables standard deviation, $\bar{x}$ represents the mean of the original variables.

In this paper, the sliding time window method is used to generate training samples to meet the input form of the LSTM model. The data input to the model each time is a two-dimensional tensor of size $num_time * num_sensor$ , where $num_sensor$ is the sensor dimension selected by the aforementioned sensor variable selection composite criterion, and $num_time$ is the length of the time window, sliding forward one time unit each time. The longer the time window is, the more original data is contained in the input sample, and the longer the time series features can be extracted by LSTM network. The maximum time step collected by each engine is $T_{i}$ , then $T_{i} + 1 - num_time$ training samples can be generated.

Multi-layer LSTM model construction

The sensor data reflecting the engine degradation trend is time series data. RNN is a model specially for processing time series data, which includes input layer, output layer and hidden layer. The data of the current sequence is affected by the historical output data. The output of the current moment feeds back to itself and participates in the input of the next moment. LSTM is improved on the basis of RNN. The cell memory unit structure is added to the RNN hidden layer, which makes the model have the ability to learn long-term dependent information.²⁵ The problems of gradient explosion or gradient disappearance can be overcome effectively. And it is more advantageous than RNN in solving the problem of long-term data series.

LSTM neural network introduces three gate structures in the hidden layer, including input gate, forget gate, and output gate, which are used to control the update and forgetting of memory to realize the storage and flow of memory in the hidden layer unit.^26,27 The LSTM cell expanded by time series is shown in Figure 1.

Figure 1.

LSTM cells expanded in time series.

In Figure 1, $x_{t}$ is the input of the current time, $C_{t}$ is the cell state (long-term memory unit), $h_{t}$ is the hidden layer state (short-term memory unit), $i_{t}$ is the input gate unit, $o_{t}$ is the output gate unit, $f_{t}$ is the forget gate unit, and $σ$ is the activation function of each gate. Mathematical expressions are as follows:

f_{t} = σ (W_{f} x_{t} + W_{f}^{'} h_{t - 1} + b_{f})

(6)

${\tilde{C}}_{t}$ represents the candidate value of the cell state. The input gate determines what value will be updated. Then a new candidate vector is created through the tanh layer, and this vector will be added to the candidate variable.

i_{t} = σ (W_{i} x_{t} + W_{i}^{'} h_{t - 1} + b_{i})

(7)

{\tilde{C}}_{t} = \tanh (W_{C} x_{t} + W_{C}^{'} h_{t - 1} + b_{C})

(8)

Update the cell state by modulating the previous cell state to the current cell state.

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}

(9)

The cell state value and the result of the output gate determine the final output:

o_{t} = σ (W_{o} x_{t} + W_{o}^{'} h_{t - 1} + b_{o})

(10)

h_{t} = o_{t} * \tanh (C_{t})

(11)

In the formula, $W_{f}$ , $W_{i}$ , $W_{o}$ , $W_{C}$ , $W_{f}^{'}$ , $W_{i}^{'}$ , $W_{o}^{'}$ , and $W_{C}^{'}$ represent weight, and $b_{f}$ , $b_{i}$ , $b_{C}$ , and $b_{o}$ represent deviation.

In this paper, the method of superimposing multi-layer LSTM is used to mine deep abstract features. Use the output of the previous layer of the model as the input of the latter layer to improve the non-linear fitting ability. The state of the nth layer at time $t$ is expressed as follows:

c_{t}^{(n)} = f_{t}^{(n)} ⊙ c_{(t - 1)}^{(n)} + i_{t}^{(n)} ⊙ {\tilde{c}}_{t}^{(n)}

(12)

h_{t}^{(n)} = o_{t}^{(n)} ⊙ \tanh ({\tilde{c}}_{t}^{(n)})

(13)

[\begin{matrix} i_{t}^{(n)} \\ f_{t}^{(n)} \\ o_{t}^{(n)} \\ {\tilde{c}}_{t}^{(n)} \end{matrix}] = [\begin{matrix} σ \\ σ \\ σ \\ \tanh \end{matrix} [\begin{matrix} W_{i, x}^{(n)} W_{i, h}^{(n)} \\ W_{f, x}^{(n)} W_{f, h}^{(n)} \\ W_{o, x}^{(n)} W_{o, h}^{(n)} \\ W_{\tilde{c}, x}^{(n)} W_{\tilde{c}, h}^{(n)} \end{matrix}] [\begin{matrix} h_{t}^{(n - 1)} \\ h_{t - 1}^{(n)} \end{matrix}]]

(14)

Where $h_{t}^{(n - 1)}$ is the state of the $n - 1$ hidden layer at time $t$ .

Finally, the final prediction RUL of the engine is output by connecting the fully connected later behind the multi-layer LSTM. The final output is expressed as:

y_{o u t} = R e L U (W_{q} h_{t}^{n} + b_{q})

(15)

In the formula, $W_{q}$ and $b_{q}$ are weight and deviations.

Particle Swarm optimization

The prediction accuracy of the model and the fitting degree of the results are affected by the hyperparameters of the multi-layer LSTM model such as the number of neurons in every LSTM layer, the time window size, and the batch size. Manual parameter adjustment is inefficient and not easy to find good parameters. Swarm intelligence optimization algorithm has strong global convergence and robustness, and does not depend on the strict mathematical properties of the optimization problem itself, so it can be used to optimize the hyperparameters of the network. The PSO has the advantages of simplicity, effectiveness, and easy implementation, and has great potential in optimizing neural network.

The PSO is a kind of stochastic optimization technology based on population, which imitates the swarm behavior of bird flocks. Each particle in the search space represents a solution. Suppose m particles forming a particle swarm fly in an n-dimensional space at a certain velocity. Each particle is composed of three n-dimensional vectors: the position vector $x_{i}$ , the velocity vector $v_{i}$ , and the historical optimal position $p_{i}$ :

x_{i} = (x_{i 1}, x_{i 2}, . . ., x_{in})

(16)

v_{i} = (v_{i 1}, v_{i 2}, . . ., v_{in})

(17)

p_{i} = (p_{i 1}, p_{i 2}, . . ., p_{in})

(18)

When the entire group is searching for a target, each particle often adjusts the next search through the optimal position it has reached and the optimal position searched by the entire group. In the iterative process, the fitness value of every particle is calculated, the velocity is updated and the direction is corrected. Lastly, the target optimal position is found, that is, the optimal result of the question. By sharing the information about the target position between groups, the velocity of finding the target can be accelerated. The particles update their own velocity and position according to the following two formulas²⁸:

\begin{matrix} v_{i} (t + 1) = w v_{i} (t) + c_{1} r_{1} [p_{i} (t) - x_{i} (t)] \\ + c_{2} r_{2} [p_{g} (t) - x_{g} (t)] \end{matrix}

(19)

x_{i} (t + 1) = x_{i} (t) + v_{i} (t + 1)

(20)

Where $c_{1}$ represents the local learning factor, $c_{2}$ represents the global learning factor, $w$ represents the inertia weight, and $r_{1}$ and $r_{2}$ represent random numbers between $[0, 1]$ . The steps of the PSO algorithm to optimize the multi-layer LSTM neural network model are as follows:

Step 1: Take the hyperparameters of the multi-layer LSTM model to be optimized as the particles, and set the value range and the maximum velocity $v_{\max}$ of each parameter.

Step 2: Initialize the population size, velocity, and position of the particles, each particle is composed of a multi-dimensional real number vector, and initialize $p_{i}$ and $p_{g}$ at the same time.

Step 3: Take the vector corresponding to each particle as the parameter value of the multi-layer LSTM model to form the network, and input the training sample data for training.

Step 4: Set the evaluation index such as score of the prediction result as the fitness value, obtain the fitness value of each particle, and determine $p_{i}$ and $p_{g}$ .

Step 5: Determine whether the algorithm meets the termination conditions. If the termination conditions of the algorithm are met, execute step 7, otherwise, execute step 6.

Step 6: Update the position and velocity of the every particle through the above (19) and (20), and go to step 3 for iteration.

Step 7: Obtain a set of global optimal values $p_{g}$ , that is, the optimal hyperparameters of the model, save the optimization result, and end the algorithm.

General procedure of the proposed method

The overall process of the engine RUL prediction method based on PSO optimized multi-layer LSTM neural network is shown in Figure 2. It is mainly divided into the following steps:

Step 1: Data preprocessing.

Figure 2.

General process of the proposed method.

The training set is composed of the historical monitoring data of the engine, and the multi-dimensional engine data that can well reflect the engine degradation trend is screened out through the composite standard of correlation and monotonicity. Then, the screened out data is preprocessed through z-score standardization to eliminate the effect of the difference in units and value ranges between different indicators.

Step 2: Sliding time window to construct training samples.

The processed training data is used to construct training samples through a sliding time window to generate a two-dimensional tensor with a fixed time window size. And the RUL corresponding to the last time in the time window is taken as the sample label. If the monitoring data is smaller than the time window size, add zeros after the data to generate training samples.

Step 3: Constructing multi-layer LSTM model.

The RUL prediction model of engine based on multi-layer LSTM is constructed. The preprocessed training samples that conform to the input of LSTM model are input into the constructed model to train. And divide a part of the training set as the validation set to cross training.

Step 4: RUL prediction.

The test set of the engine monitoring data is preprocessed according to steps 1 and 2 to construct test samples and input them to the trained LSTM prediction model. Output the engine RUL value predicted corresponding to the last time of the time window.

Step 5: PSO optimization.

Optimizing automatically the hyperparameters such as time window size, the number of neurons in every LSTM layer, and the batch size of the multi-layer LSTM prediction model improves the degree of fitting, the model performance, and the prediction accuracy.

Step 6: Evaluation of the prediction method.

The predicted engine RUL is compared with the actual engine RUL, and the performance of the prediction model is evaluated by the scoring function (S),²⁹ RMSE,³⁰ and MAE.³¹ The formulas are defined as follows:

S = {\begin{matrix} \sum_{i = 1}^{n} e^{- \frac{d}{13}} - 1, d < 0 \\ \sum_{i = 1}^{n} e^{\frac{d}{10}} - 1, d \geq 0 \end{matrix}

(21)

Where n represents the number of samples in the test set, $d = RU L_{predict} - RU L_{true}$ , $R U L_{p r e d i c t}$ represents the predicted RUL, and $RU L_{true}$ represents the actual RUL. If the prediction is ahead of time, the predicted value is smaller than the actual value, the penalty coefficient is smaller. On the contrary, if the prediction is behind, the penalty coefficient is larger.

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} d_{i}^{2}}

(22)

MSE = \frac{1}{n} \sum_{i = 1}^{n} | d_{i} |

(23)

RUL prediction method

Engine experimental data

This paper uses the C-MAPSS data set³² as experimental data to test the effectiveness of the proposed method. The data set is provided by NASA’s Prognostics Center of Excellence, and the turbofan engine is taken as the specific experimental object. Each sub data set includes both training set equipment and test set equipment. The Data set contains 26 columns of numbers and is composed of multiple multi-dimensional time series. There are 21 columns of sensor monitoring data, unit number, time, and three columns of operational settings. The turbofan engine in the training set has complete run-to-failure data with labels. The test set only provides the previous data in the whole life cycle, and the RUL is used as result validation in the RUL_FD00X document. In this paper, FD001 sub data set is mainly used as experimental data, including 100 training set data and 100 test set data listed in Table 1.

Table 1.

Data set of C-maps FD001.

Data set	Train trajectories	Test trajectories	Conditions	Fault modes
FD001	100	100	1	1

The system performance of the turbofan engine is in a healthy condition at the beginning of operation. The turbofan engine starts gradually to degrade after a period of operation, and the degradation intensifies toward the end of its life. Therefore, a piece-wise linear RUL label is used to take the place of a simple linear RUL label. The maximum RUL of the turbofan engine is set as 120–130,^33,34 so that the model can better extract the degradation features and improve the prediction accuracy.

Multi-sensor data fusion and filter

The FD001 training set and test set record three operational settings and 21 engine sensor data. To screen out the variables that can represent or greatly affect the engine state, the correlation and monotonicity of each parameter in the training set are calculated through correlation and monotonicity criteria. Divide the data of every 25 engines in the training set into a group, and the results of each group are averaged. The result is shown in Figure 3.

Figure 3.

Correlation and monotonicity mean of FD001 data: (a) id: 1–25, (b) id: 26–50, (c) id: 51–75, and (d) id: 76–100.

Then, the parameter characteristics that meet the threshold conditions are screened by compound criteria. This paper sets the threshold to 0.5. It can be seen from Figure 4 that the sensor parameters selected in each group are the same. The selected parameter characteristics are shown in Table 2.

Figure 4.

Corr-Mon compound criteria of FD001 data: (a) id: 1–25, (b) id: 26–50, (c) id: 51–75, and (d) id: 76–100.

Table 2.

Selected sensor parameters of FD001.

Selected sensor parameters

S2, S3, S4, S7, S8, S9, S11, S12, S13, S14, S15, S17, S20, S21

The selected parameter features are preprocessed by z-score standardization, and then training samples are constructed by sliding time windows, which are used as input for the subsequent models.

RUL prediction using multi-layer LSTM neural network

This paper adopts the prediction method of mapping engine monitoring data directly to RUL and takes the FD001 sub data set as the experimental sample. We input the processed training samples into the multi-layer LSTM neural network model for training and divide 10% of training samples from the training set into the validation set for cross-validation. The final output layer is the Dense layer of a neuron, whose activation function is ReLU, which outputs the predicted RUL of the engine. This model uses Adam optimization method for gradient optimization,³⁵ and the loss function used is the RMSE function. Finally, we input the processed engine test set data into the trained model to predict its RUL.

Compared with one layer LSTM, superimposing multi-layer LSTM can extract deeper sequence features, but too many layers of the LSTM network will also cause problems such as increased training time and overfitting. Therefore, the LSTM layer number comparison experiment from one layer to six layers was set up to explore the influence of different number of layers on model performance. The experiment of each layer number was run multiple times. The results are taken as the averaged and shown in Table 3. As shown in Figure 5, the prediction results are better when the number of layers is four or five, but the error increases significantly when the number of layers increases to six.

Table 3.

Performance of multi-layer LSTM with different number of layer.

Number of layers	RMSE	MAE	Score	Time (s)
1	14.85	11.04	468.7	66
2	14.65	10.53	411.3	123
3	15.49	11.74	480.9	220
4	13.94	10.50	342.3	319
5	14.62	11.05	402.6	430
6	15.98	12.00	643.2	578

Figure 5.

Normalized values of evaluation criteria for different number of layers.

The next step is to choose the length of the time window. Choosing an appropriate size of the time window, the RUL of the engine can be well predicted. On the basis of the four-layer LSTM, eight groups of time window sizes of 5, 10, 15, 20, etc. were carried out to compare the influence of different time window size on the model performance. The results are shown in Table 4. We can see from Figure 6 that the prediction result is better when the time window size is about 30.

Table 4.

Performance of multi-layer LSTM with different time window size.

Time window size	RMSE	MAE	Score	Time (s)
5	17.47	12.25	1002	60
10	19.23	13.74	1447	127
15	17.89	12.85	1034	190
20	17.03	12.24	813.4	197
25	17.02	12.73	519.0	312
30	13.94	10.50	342.3	319
35	15.37	11.44	469.3	443
40	16.06	11.73	515.7	604

Figure 6.

Normalized values of evaluation criteria and training time for different time window size.

According to the layer number experiment and time window size experiment, the time window size selected in this paper is 30, and the layer number of LSTM model is set to 4.

RUL prediction using PSO optimized multi-layer LSTM neural network

The number of neurons in every layer and the batch size of the multi-layer LSTM model also greatly impact the prediction accuracy of the model. Manual parameter adjustment is inefficient and it is difficult to find good parameters. The PSO algorithm has the superiority of high precision and easy implementation. The method of PSO automatic parameter adjustment is used to optimize the multi-layer LSTM neural network model to improve the prediction accuracy and performance of LSTM model.

On the basis of the four-layer LSTM and the time window size of 30, the number of every LSTM layer neurons and the batch size of the model are taken as the particles of the PSO algorithm. The value range of the number of every layer neurons is set to [10, 200], and the value range of batch size is set to [10, 520]. Set the scoring function evaluation index of the LSTM model prediction result as the fitness value. And set the group size m to 20 and the maximum velocity $v_{\max}$ to 5. When the global optimal location is updated, the optimal model is saved. The results of prediction based on PSO optimized multi-layer LSTM network are shown in Figure 7.

Figure 7.

RUL prediction result of FD001 test data.

Comparison of the proposed method and traditional methods

The multi-layer LSTM network model optimized based on PSO is compared with SVR, CNN, CNN + PSO, DNN, DBN, ELSTMNN[14], RNN-Autoencoder[36], basic LSTM, and multi-layer LSTM network model that is not optimized to test its performance. The CNN model has five layers in this paper, including two convolution layers (Conv1 and Conv2), two max-pooling layers (Maxpooling1 and Maxpooling2), and one fully connected layer (FC). In the CNN + PSO method, the number and size of convolution kernels are taken as hyperparameter to be optimized. The deep neural network (DNN) has four hidden layers. The number of neurons in the hidden layers is 500, 400, 300, and 100, respectively. The structural parameters (i.e. the number of hidden layers and the number of hidden neurons per hidden layer) of the deep belief network (DBN) strongly depend on problem complexity and the number of available training samples. Given the complexity of our studied problems and the limited number of available training samples, the number of hidden layers and the number of hidden neurons per hidden layer in a DBN are not set to be very large in our experiments. Specifically, the number of hidden layers for each DBN is set to three. We can see from the Table 5 that the prediction accuracy of the model optimized by the PSO algorithm has been greatly improved, with reductions of 11.4%, 11.6%, and 17% in the RMSE, MAE, and Score functions compared to before optimization. The prediction results of an engine in the test set of different methods are shown in Figure 8. Compared with other models, the predicted RUL of LSTM network model is closer to the real RUL in the degradation process, especially in the second half of the degradation process, the predicted RUL curve is smoother. The confidence interval of LSTM network model in the second half of the degradation process is also relatively small fluctuations, the general range is near the true RUL, indicating that the LSTM model can predict RUL well in the later stage of operation, and the stability and repeatability of the prediction results are better. Compared with the four-layer LSTM network model, the optimized model can fit the actual RUL better in the degradation process of the engine, and the fluctuation of the predicted value is smaller, and it can also fit the actual RUL better in the early stage of the degradation stage. We can see from the Table 5 and Figure 9 that LSTM model has advantages over other methods in processing long-time series problems. In the comparison of RMSE, MAE and score, the prediction results of the proposed method are the best.

Table 5.

Comparisons of other methods.

Methods	RMSE	MAE	Score
SVR	15.69	11.90	500.8
CNN	17.46	13.19	650.0
PSO + CNN	16.89	12.96	586.9
DNN	13.56	10.26	348.3
DBN	18.48	14.23	700.1
ELSTMNN[14]	18.22	N/A	N/A
RNN-autoencoder³⁶	13.58	N/A	220.0
One layer LSTM	14.85	11.04	468.7
Multi-layer LSTM	13.94	10.50	342.3
The proposed method	12.35	9.28	284.1

Figure 8.

Partial results of RUL prediction with different methods: (a) SVR, (b) CNN, (c) PSO + CNN, (d) DNN, (e) DBN, (f) one layer LSTM, (g) multi-layer LSTM, and (h) the proposed method.

Figure 9.

Normalized values of evaluation criteria for different methods.

Conclusions

This paper proposes a direct prediction method of engine RUL based on PSO optimized multi-layer LSTM neural network. Through the composite standard of correlation and monotonicity, the sensor monitoring data that can well reflect the engine degradation trend can be effectively screened out. By standardizing the data, the effect of units and value range between different sensor data is eliminated. The training samples are constructed by sliding time windows, and the correlation between the front and back of the sensor data sequence is preserved. Combined with the superiority of LSTM neural network in processing time series, the engine RUL prediction model of multi-layer LSTM is established to fully excavate the time characteristics of data and predict the RUL of engine. PSO algorithm is used for adjusting automatically the hyperparameters of the LSTM prediction model, optimize the performance of the model, and overcome the time-consuming and tedious shortcomings of manual parameter adjustment. Use NASA’s C-MPASS data set as experimental data to verify the performance of the proposed method. The comparison results show that this method has advantages over other methods in prediction accuracy.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: 2018YFE02013 and Grant 2020YFE0204900, and Key Research and Development Plan of Shandong Province under Grant 2019TSLH0301 and Grant 2019GHZ004.

ORCID iDs

Xinlong Li

Fei Miao

References

Wang

Guo

Sun

X-M.

Remaining useful life predictions for turbofan engine degradation based on concurrent semi-supervised model. Neural Comput Appl 2022; 34: 5151–5160.

Guo

A review on prognostics methods for engineering systems. IEEE Trans Reliab 2020; 69: 1110–1129.

Wang

Wei

, et al. Remaining useful life prediction for equipment based on RF-BiLSTM. AIP Adv 2022; 12: 115209.

Wang

Lei

, et al. A hybrid prognostics approach for estimating remaining useful life of rolling element bearings. IEEE Trans Reliab 2020; 69: 401–412.

Najeh

Lundberg

Degradation state prediction of rolling bearings using ARX-Laguerre model and genetic algorithms. Int J Adv Manuf Technol 2021; 112: 1077–1088.

Teng

Han

, et al. A robust model-based approach for bearing remaining useful life prognosis in wind turbines. IEEE Access 2020; 8: 47133–47143.

Khelif

Chebel-Morello

Malinowski

, et al. Direct remaining useful life estimation based on support vector regression. IEEE Trans Ind Electron 2017; 64(3): 2276–2285.

Yang

Liu

Zio

Remaining useful life prediction based on a double-convolutional neural network architecture. IEEE Trans Ind Electron 2019; 66(12): 9521–9530.

Zhang

, et al. Data alignments in machinery remaining useful life prediction using deep adversarial neural networks. Knowl Based Syst 2020; 197: 105843.

10.

Sbarufatti

Corbetta

Manes

, et al. Sequential Monte-Carlo sampling based on a committee of artificial neural networks for posterior state estimation and residual lifetime prediction. Int J Fatigue 2016; 83: 10–23.

11.

Deutsch

. Using deep learning based approaches for bearing remaining useful life prediction. In: Annual conference of the PHM society, 2016, Vol. 8, No. 1.

12.

Sateesh Babu

Zhao

. Deep convolutional neural network based regression approach for estimation of remaining useful life. In: International conference on database systems for advanced applications, 2016 April, pp.214–228. Cham: Springer.

13.

Ren

Sun

Wang

, et al. Prediction of bearing remaining useful life with deep convolution neural network. IEEE Access 2018; 6: 13041–13049.

14.

Cheng

Zhu

, et al. Remaining useful life prognosis based on ensemble long short-term memory neural network. IEEE Trans Instrum Meas 2021; 70: 1–12.

15.

Zhang

Xiong

, et al. Long short-term memory recurrent neural network for remaining useful life prediction of lithium-ion batteries. IEEE T Veh Technol 2018; 67(7): 5695–5705.

16.

Heimes

. Recurrent neural networks for remaining useful life estimation. In: 2008 international conference on prognostics and health management, Denver, CO, USA, 06–09 October 2008, pp.1–6. New York, NY: IEEE.

17.

Yuan

Lin

. Fault diagnosis and remaining useful life estimation of aero engine using LSTM neural network. In: 2016 IEEE international conference on aircraft utility systems, AUS, Beijing, China, 10–12 October 2016, pp.135–140. New York, NY: IEEE.

18.

Malhotra

Ramakrishnan

, et al. Multi-sensor prognostics using an unsupervised health index based on LSTM encoder-decoder. arXiv preprint arXiv 1608.06154, 2016.

19.

Bai

Zhao

Rong

HJ.

Novel direct remaining useful life estimation of aero-engines with randomly assigned hidden nodes. Neural Comput Appl 2020; 32(18): 14347–14358.

20.

Lei

Guo

, et al. Machinery health prognostics: a systematic review from data acquisition to RUL prediction. Mech Syst Signal Process 2018; 104: 799–834.

21.

Razavi-Far

Chakrabarti

Saif

, et al. Extreme learning machine based prognostics of battery life. Int J Artif Intell Tools 2018; 27(08): 1850036.

22.

Kong

Tang

Deng

, et al. Condition monitoring of wind turbines based on spatio-temporal fusion of Scada data by convolutional neural networks and gated recurrent units. Renew Energy 2020; 146: 760–768.

23.

Jiang

Xiong

, et al. Rolling bearing health prognosis using a modified health index based hierarchical gated recurrent unit network. Mech Mach Theory 2019; 133: 229–249.

24.

Lin

Liao

, et al. A novel product remaining useful life prediction approach considering fault effects. IEEE-CAA J Autom Sin 2021; 8: 1762–1773.

25.

Abdul

Al-Talabani

Ramadan

DO.

A hybrid temporal feature for gear fault diagnosis using the long short term memory. IEEE Sens J 2020; 20(23): 14444–14452.

26.

Zhang

Bearing performance degradation assessment using long short-term memory recurrent network. Comput Ind 2019; 106: 14–29.

27.

Qin

Xiang

Chai

, et al. Macroscopic–microscopic attention in LSTM networks based on fusion features for gear remaining life prediction. IEEE Trans Ind Electron 2020; 67(12): 10865–10875.

28.

Song

Liu

Xue

, et al. Time-series well performance prediction based on long short-term memory (LSTM) neural network model. J Pet Sci Eng 2020; 186: 106682.

29.

Saxena

Goebel

PHM08 Challenge Data Set, NASA Ames Prognostics Data Repository NASA. Moffett Field, CA: Ames Research Center, 2008.

30.

da Costa

PRDO

Akçay

Zhang

, et al. Remaining useful lifetime prediction via deep domain adaptation. Reliab Eng Syst Saf 2020; 195: 106682.

31.

Zhu

Chen

Shen

A new data-driven transferable remaining useful life prediction approach for bearing under different working conditions. Mech Syst Signal Process 2020; 139: 106602.

32.

Saxena

Goebel

Simon

, et al. Damage propagation modeling for aircraft engine run-to-failure simulation. In: 2008 international conference on prognostics and health management, Denver, CO, USA, 06–09 October 2008, pp.1–9. New York, NY: IEEE.

33.

Zheng

Ristovski

Farahat

, et al. Long short-term memory network for remaining useful life estimation. In: 2017 IEEE international conference on prognostics and health management (ICPHM), Dallas, TX, USA, 19–21 June 2017, pp.88–95. New York, NY: IEEE.

34.

Ding

Sun

JQ.

Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab Eng Syst Saf 2018; 172: 1–11.

35.

Kingma

Adam: A method for stochastic optimization. arXiv preprint arXiv 1412.6980, 2014.

36.

Kim

Mechefske

An improved similarity-based prognostic algorithm for RUL estimation using an RNN autoencoder scheme. Reliab Eng Syst Saf 2020; 199: 106926.