A kernel principal component analysis–based degradation model and remaining useful life estimation for the turbofan engine

Abstract

Remaining useful life estimation of the prognostics and health management technique is a complicated and difficult research question for maintenance. In this article, we consider the problem of prognostics modeling and estimation of the turbofan engine under complicated circumstances and propose a kernel principal component analysis–based degradation model and remaining useful life estimation method for such aircraft engine. We first analyze the output data created by the turbofan engine thermodynamic simulation that is based on the kernel principal component analysis method and then distinguish the qualitative and quantitative relationships between the key factors. Next, we build a degradation model for the engine fault based on the following assumptions: the engine has only had constant failure (i.e. no sudden failure is included), and the engine has a Wiener process, which is a covariate stand for the engine system drift. To predict the remaining useful life of the turbofan engine, we built a health index based on the degradation model and used the method of maximum likelihood and the data from the thermodynamic simulation model to estimate the parameters of this degradation model. Through the data analysis, we obtained a trend model of the regression curve line that fits with the actual statistical data. Based on the predicted health index model and the data trend model, we estimate the remaining useful life of the aircraft engine as the index reaches zero. At last, a case study involving engine simulation data demonstrates the precision and performance advantages of this prediction method that we propose. At last, a case study involving engine simulation data demonstrates the precision and performance advantages of this proposed method, the precision of the method can reach to 98.9% and the average precision is 95.8%.

Keywords

Kernel principal component analysis degradation model turbofan engine remaining useful life

Introduction

With the increasing flight mission, engine complexity, and informationalized level, along with the aircraft crash frequency, demands to ensure the performance and reliability of aircraft engine have become increasingly important. Based on a traditional maintenance strategy and approach, turbofan engines may not prognosticate, isolate, and repair the faulty part quickly when they are still in an operational state, so the engine may break down before its scheduled overhaul. Therefore, turbofan engine availability may slow down while related maintenance costs rise, for example, the maintenance costs of aircraft engines almost reach the 70% of their whole life cycle costs.¹ As the maintenance mode changes from time scheduled to condition based, enhancing the diagnostics and prognostics can improve the engine reliability and availability while reducing life cycle costs, which is the key technology of turbofan engine maintenance.²

For turbofan engines, improved engine availability and reduced maintenance costs can be achieved by using the engine’s health state information obtained from diagnostic and prognostic analysis. This is an effective way to achieve condition-based maintenance. YG Li and P Nilkitsaranont² described linear and quadratic models of the regression techniques and used historical health information to estimate the remaining useful life (RUL) of gas turbine engines before the next engine major overhaul. Li et al.³ combined failure mode (criticality information provided by failure mode, effects, and criticality analysis (FMECA)) with reliability analysis to predict the engine’s turbocharger lifetime. Vanini et al.⁴ proposed an fault diagnosis isolation (FDI) system using the multiple model method and the dynamic neural networks (DNNs), which connect each specific operating mode used for aircraft gas turbine engines. The mode is either a healthy engine or the failure condition engine. Basir and Yuan⁵ investigated a multi-sensory fusion model based on the Dempster–Shafer evidence theory to detect the engine quality and predict the engine life cycle. M Chen et al.⁶ proposed a probabilistic model of the turboshaft engine to estimate the engine’s overall performance and to consider the effect of many uncertainties. Empirical mode decomposition (EMD) is an extremely useful signal processing technique, and many researchers used this method in gas turbine engine fault diagnosis and RUL prognosis.⁷ Based on the above authors’ work, it is very important that the model is the key factor for predicted RUL. If an insufficient data set is available for algorithm training, modified algorithms should be researched to enhance precision.

Fault diagnosis and prognostics for aircraft gas turbine engines are complicated by the presence of different manufacturing of the same type of engine and degradation during operation, the complexity of a trustworthy turbofan engine model, and the difficulty of obtaining variable parameters in the operating engine.^8,9 Y Li and P Nilkitsaranont’s² model needs a great deal of historical health information, Li et al.³ only focused one component of engine, Basir and Yuan⁵ did not consider the nonlinear relationship of the outputs of sensors, M Chen et al.’s⁶ probabilistic model contained the uncertainties, and the precision has the space to improve. In summary, it is difficult to establish an integrated turbofan engine degradation model based on physical machinery due to the number of subsystem interactions upon which turbofan engines are dependent.¹⁰ An accurate failure degradation model for turbofan engines is not available for estimating the RUL. As a result, data-driven prognostics were proposed using historical lifetime data to deduce the system variation tendency.^11,12 However, the data-driven-only approach (which does not include estimates of the failure degradation engine performance to assess turbofan engine performance and therefore predict the RUL) is not sufficient. So if we use a fatigue life formula such as the Arrhenius equation, or Eyring model, or one of its modifications in addition to the data-driven approach to predict the RUL of the turbofan engine, the precision will exceed any one of the individual methods mentioned above.^13,14 In fact, researchers have increasingly focused on both the lifetime estimation based on variation parameters performance over time and the data-driven method, as this method (which corresponds to both real operational data and engine failure mechanisms) is more effective.¹⁵

In the following sections, we propose a kernel principal component analysis (KPCA)-based degradation model for turbofan engine reliability. We assume that a commonly used equation for wear is provided and connect with the performance variation of parameters that enable us to assess cumulative degradation and predict the RUL of the engine. The primary contributions of this article are as follows:

We first analyzed the data of sensors that are created by the turbofan engine thermodynamic simulation through the KPCA method and distinguished the qualitative and quantitative relationships between the key factors.

Based on the result of KPCA analysis for the data generated by the turbofan engine simulation and the understanding in engine physical machinery at present, we proposed a degradation model for the turbofan engine based on the result of data analysis.

We established a health index equation to predict the RUL of the turbofan engine and used the method of maximum likelihood to estimate the parameters of the health index model on the basis of the data from the simulation model.

The organization of this article is shown as follows:

Section “Turbofan engine model” briefly presents the structure of the turbofan engine model.

Section “KPCA-based degradation model” describes the KPCA-based degradation model. It shows the engine system drift as a Wiener process and the result of the KPCA analysis as the key factor. We introduce the principle of the KPCA method and establish the engine degradation model.

Section “Predicting the RUL of the turbofan engine” describes a way of predicting the RUL of the turbofan engine. We propose a health index model of the engine and present how to estimate the model parameters using simulation data and the method of maximum likelihood as well.

Section “Case study” describes a case study in which we apply the proposed method to model the engine degradation and predict the RUL. Finally, we analyze the practicability and the benefit of our method.

Turbofan engine model

Engine fault detection and prognostics are representative sensor information fusion problem. It contains the information such as speed, vibration, altitude, sound, pressure, temperature, and so on, so it is important to build an appropriate model. This article chooses the engine model that is established by NASA.¹⁶ The thermodynamic turbofan engine model simulation depicts and reflects a real aircraft turbofan engine. The engine composition in Figure 1 shows the main elements of the thermodynamic turbofan engine model. Figure 2 shows a flowchart of the engine that includes the main modules and describes the relevant detail of all model information.^17–19 For the thermodynamic turbofan engine models, rotor and volume dynamics are involved in the nonlinear system. The engine model considered the volume dynamics and damage factors and added in an unbalanced mass flow rate. The limit factors of the thermodynamic turbofan engine model simulation control system have three high-limit regulators. These regulators avoid the turbofan engine from surpassing its design limits (such as engine speed, temperature of engine high pressure turbine, and pressure ratio). If the pressure of engine high pressure compressor becomes too low, a limit regulator could avoid this phenomenon. The corresponding speed limit regulator is an acceleration and deceleration.

Figure 1.

Turbofan engine structure.

Figure 2.

The aircraft engine modules and information flowchart and relevance.^17–19

The thermodynamic turbofan engine simulation has 14 different inputs which include fuel flow, fan efficiency, fan flow, fan pressure-ratio modifier, low pressure compressor efficiency, high pressure compressor efficiency, and so on. It can also generate 21 outputs through the engine model with all inputs. The outputs include fan inlet temperature (T2), low pressure turbine outlet temperature (T50), high pressure compressor outlet temperature (T30), low pressure compressor outlet temperature (T24), actual core speed (Nc), and so on. The inputs enable the user to change the amplitude of variations that simulate the degradation processes containing fan, low pressure compressor, high pressure compressor, low pressure turbine, and high pressure turbine. The outputs contain different sensor measurement data and all of the engine operating life cycle data. In this article, we choose 21 different outputs to analyze and discuss, similar to what is used in the “Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation” document.¹⁶

This section shows the model and the training and testing data used in this article. The data are also used for estimating the RUL of the turbofan engine (the flow diagram of the proposed approach is shown in Figure 3). The proposed approach is presented in detail here.

Figure 3.

Flow diagram of the proposed KPCA-based degradation model approach.

KPCA-based degradation model

As described in section “Turbofan engine model,” 21 different outputs from the gas turbine simulation were chosen to discuss in this article. Because of the large amount of output data, it is difficult to establish the degradation model. In addition, when the dimension of fault coefficient matrix is high, it may lead to a strong smearing effect.²⁰ Even if we establish a 21 dimension model, the calculation amount will be very large. Therefore, the output data should be processed before establishing the degradation model. In this study, the KPCA method was chosen to reduce the dimension of raw data.

Data preprocessing

When comparing the maximum and minimum numerical values of a limited data set, a huge difference exists. Normalization, however, can make the upper and lower bounds in a short value of a data set easier to analyze. In this study, there are two global approaches for normalization. The first approach is shown in equation (1), which is the min–max normalization⁴

x^{'} = 2 \frac{x - x_{\min}}{x_{\max} - x_{\min}} - 1

(1)

where x_min stands for the minimum value of x, x_max stands for the maximum value of x, and x′ represents the normalization value of x. This approach makes the range of x after processing become [−1, 1]. The second approach is shown in equation (2), and is the z-score normalization

x' = \frac{x - μ (x)}{σ (x)}

(2)

where µ(x) represents the mean value of x, σ(x) represents the root-mean-square deviation of x, this z-score normalization generates the new data x′, the mean value of every member of the new data is closed to 0, and the root-mean-square deviation is closed to 1. This article uses the min–max normalization because it is difficult to analyze when the new data is close to 0. Generally, data normalization is the first step in data analysis or data training and testing. We use normalization data preprocessing in this article as the first step of the proposed KPCA-based degradation model approach. The benefit of this normalization approach is to reduce the probability of emerging abnormal values in the raw data.

KPCA

The KPCA method (which considered kernel factor) is the extension of the PCA in the nonlinear field. The input space x_k (k = 1, 2, …, n) is mapped to the feature space F: Φ(x_k)(k = 1, 2, …, n) by nonlinear mapping Φ: R^m→F. The basic KPCA method is introduced as follows.^21–25

The covariance matrix C^F should be constructed for principal component (PC) analysis in the high dimension feature space F. Through the data processing that above z-score normalization method makes ∑nk = 1Φ(x_k) = 0, and the covariance matrix is afford by

C^{F} = \frac{1}{n} \sum_{i = 1}^{n} Φ (x_{i}) Φ {(x_{i})}^{T}

(3)

So we can obtain the PC of the feature space F through the nonzero eigenvalues λ of the covariance matrix C^F, which can be calculated by

λ V = C^{F} V

(4)

where V is the corresponding eigenvector in the feature space F. And C^FV can be expressed by

C^{F} V = \frac{1}{n} \sum_{i = 1}^{n} 〈 Φ (x_{i}), V 〉 Φ (x_{i})

(5)

where $〈 x, y 〉$ means dot product operation of x and y. The formula (5) can be equivalent to

λ 〈 Φ (x_{k}), V 〉 = 〈 Φ (x_{k}), C^{F} V 〉, k = 1, 2, \dots, n

(6)

So eigenvector V can be given by

V = \sum_{i = 1}^{n} α_{i} Φ (x_{i}), i = 1, 2, \dots, n

(7)

where α_i is the corresponding coefficients.

Simultaneous equations (3)–(7)

λ \sum_{j = 1}^{n} α_{j} 〈 Φ (x_{k}), Φ (x_{j}) 〉 = \frac{1}{n} \sum_{j = 1}^{n} α_{j} 〈 Φ (x_{k}), \sum_{i = 1}^{n} Φ (x_{i}) 〉 〈 Φ (x_{k}), Φ (x_{j}) 〉

(8)

To solve the above equation, a kernel matrix K with size n × n is defined, and its elements are given by

K_{ij} = 〈 Φ (x_{i}), Φ (x_{j}) 〉 = k (x_{i}, x_{j}) \Rightarrow n λ K α = K^{2} α

(9)

where $α = [α^{1}, \dots, α^{n}]^{T}$ .

The kernel matrix K can be centralized by

{\bar{K}}_{ij} = (K - l_{n} K - K l_{n} + l_{n} K l_{n})_{ij}

(10)

where matrix $l_{n} = (1 / n)_{n \times n}$ .

A Gaussian radial basis kernel function (RBF), which is expert in nonlinear system and the Gaussian RBF mapping feature space, distinguishes the characteristics very well. The Gaussian RBF is used in this article

G (x_{i}, y_{j}) = \exp (- \frac{{‖ x_{i} - y_{i} ‖}^{2}}{2 σ^{2}})

(11)

where σ is the dispersion of the Gaussian RBF.

The kth KPCA-transformed feature p_k can be achieved by

p_{k} = 〈 V^{k}, Φ (x) 〉 = \sum_{j = 1}^{n} α_{j}^{k} 〈 Φ (x_{j}), Φ (x) 〉 = \sum_{j = 1}^{n} α_{j}^{k} G (x_{j}, x)

(12)

The computational complexity and the prognostic accuracy are influenced by the number of PCs. The larger the number of PCs, the more information is reserved, and the higher prognostic accuracy is ensured. However, the longer analysis time is needed with the larger number of PCs. In this article, the definition of the primary contribution is

Contr (λ_{k}) = \frac{λ_{k}}{\sum_{j = 1}^{n} λ_{j}} \times 100 %

(13)

An appropriate function is used to determine the number of PCs based on the cumulative percent variance theory

CPV = \frac{\sum_{j = 1}^{l} λ_{j}}{\sum_{j = 1}^{n} λ_{j}} \geq CL

(14)

where λ_j is the eigenvalue of the jth PC with λ₁ ≥ λ₂ ≥ ··· ≥ λ_n, CL is the threshold of PCs, l is the number of PCs, and n is the number of data used to train KPCA. The CL generally equals 85% as threshold, and the number of PCs will be determined with the chosen adaptive, CL.

The degradation model

There is no clear engine degradation model based on engine component oxidation such as bearing stress resulting in engine breakdown, engine operation deformation, or gas path temperatures. There are two generally used estimation models shown here.

The first is the Arrhenius model that is used in machine faults. The Arrhenius model is expressed by

t_{f} = A e^{\frac{Δ H}{kT}}

(15)

where t_f is the fault happening time, T is the temperature when the failure appeared or the machine stopped working, k is a constant, A is a proportional coefficient, and ΔH is the energy value. At the beginning, this model simulated the process of chemical reaction and energy dissemination (i.e. it was used in physics areas and electronic devices). Lately, the modified Arrhenius model has been used in machines and other areas.

The second is the Eyring model, and the equation is given by

t_{f} = A T^{α} e^{(\frac{Δ H}{kT} + (B + \frac{C}{T}) S_{1} + (D + \frac{E}{T}) S_{2})}

(16)

where t_f is the fault happening time; α, ΔH, A, B, C, D, and E are constants with stress variation; S₁ and S₂ are related stresses; k is a constant; and T is temperature in degrees zero absolute temperature. The model describes the relationship between failure time and stress. The advantage of the Eyring model is that the equipment’s working temperature and stress are mixed together into one equation. And the weakness of this model is that there are too many constants to be defined and parameters to be measured.¹⁶

We can find that the Arrhenius model and Eyring model are both the exponential function for the failure degradation. Although it is hard to establish a precise model for the complicated system that just uses a single function, an exponential function could have the ability to accommodate the variation law of deformation, temperature, oxidation, blade attrition, and engine breakdown.²⁶ By combining our results from the KPCA analysis simulation and the current understanding about the failure mechanism using the Arrhenius and Eyring models, we have devised a degradation model for the turbofan engine. The degradation model uses the results of the KPCA analysis, with the KPCA-transformed feature p_k as a dynamic covariate. Based on the data analysis and physical principle, we established an equation for the degradation rate

w (t) = \sum_{k = 1}^{l} w_{k} e^{\frac{- b}{p_{k} (t)}}

(17)

where b stands for an unknown constant, p_k(t) is the KPCA-transformed feature at time t, w_k is a constant, t is the operating time, and l is the number of PCs. PCs trend model is introduced in Section “Case study.” In this equation, we ignore the sudden failure and assume the fault is a constant, slow accumulation.

The degradation, X(t) at time t is expressed based on equation (17)

X (\frac{t}{p_{k} (τ)}, 0 \leq τ \leq t) = \int_{0}^{t} \sum_{k = 1}^{l} w_{k} e^{\frac{- b}{p_{k} (τ)}} d τ + B (t)

(18)

where B(t) is a Wiener process which include the alternating quantity σ2 B. Because the degradation is cumulative, equation (18) is the integral for equation (17). There are three unknown parameters in equation (18)

θ = (w_{k}, b, σ_{B}^{2})

Predicting the RUL of the turbofan engine

Health index equation of the turbofan engine

As generally known, if we want to reduce maintenance costs and engine downtime, save people workload and repair time, and enhance the turbofan engine service life, the traditional hard time maintenance and failure-based maintenance should change to condition-based maintenance.²⁷ Predicting the RUL is a very important part of condition-based maintenance. We must design the maintenance schedule based on the RUL result.²⁸ We can use a model to describe the relationship between the engine’s health state and the outputs of the engine sensors.^29,30 However, it is still difficult to implement the fault prognostic and RUL prediction of the engine due to the turbofan engine system’s complexity and the absence of a relevant structure describing the relationship between the high dimension of operating parameters and system performance.

Nonetheless, if the problem of unknown parameters evolves and the trend of each PC of the engine failure data can be solved according to model (18), an estimated wear degradation and RUL may be predicted.

In order to estimate the engine RUL directly, a turbofan engine health index model, H(t), was built based on the wear degradation model and the fault threshold. The fault threshold, th_w, is a value representing when the engine will stop working and when wear accumulation equals that value. The health index model is given by

H (t) = 1 - \frac{X (t)}{t h_{w}}

(19)

Combine wear and threshold parameter $w_{k} / t h_{w} = β_{k}$ .

The health index model is

H (\frac{t}{p_{k} (τ)}, 0 \leq τ \leq t) = 1 - \int_{0}^{t} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ - B (t)

(20)

When equation (20) is obtained, equation (18) is put into equation (19). From the equation above, it is clear that the turbofan engine will fail when the health index equals zero. And the RUL of turbofan engine can be obtained through the health index.

The method of estimating the health index parameters is also very important. Fourier transform is good at analysis signals in frequency domain and disassembles data in different frequencies. But Fourier transform is not appropriate for a nonlinear signal.³¹ Wavelet Packet Energies are skilled in nonlinear signal processing, but this method lacks wavelet filter parameters selection. Hidden Markov Model (HMM) is skilled in processing unknown parameters in which the established model should be a Markov process, and the HMM approach is usually adapted by dynamic fault model and prognostic problem on nonlinear data. The disadvantage of the HMM method is that it is hard to process the failure model with big data.³² The maximum likelihood estimation (MLE) is a statistical method. The main idea of this method is to make the probability of approaching the sample observation maximum when used in the appropriate calculated parameter estimator. We can use the known sample conclusion to back step the most probability value of estimated parameters, resulting in this chosen sample.³³ That is to say, it is a very useful method to calculate the parameters with a time series. In this study, we choose the MLE method to calculate the health equation parameters.

MLE of the health index prediction

We assume that there are n output samples of the turbofan engine failure simulation. We also assume the life cycle of the engine operation is t_i,g, g = 0, 1, 2, … and t_i,0 = 0 in sample unit i, the X_i(t_i,g) represents the degradation level. The PCs of unit i at time j is p_i,j, j = 0, 1, 2, … In the following discussion, the approach of MLE is studied carefully and solves the problem of unknown parameters by analyzing the engine simulation generated data through the MLE method. Our analysis in the following figure shows that the wear of engine components increase as cycle increases.

According to health index equation (20), let

v_{i, g} = \sum_{j = 1}^{g} \sum_{k = 1}^{l} e^{\frac{- b}{p_{k} (t_{i, j})}}

(21)

Health index equation can write in sample i and time t_i,g

H_{i} (t_{i, g}) = 1 - \int_{0}^{t_{i, g}} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ - B (t_{i, g}) \approx 1 - \sum_{j = 1}^{g} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (t_{i, j})}} - B (t_{i, g})

(22)

where

E [H_{i} (t_{i, g})] = 1 - β_{k} \cdot v_{i, g}

(23)

\begin{matrix} cov [H_{i} (t_{i, g}), H_{i} (t_{i, h})] & = cov [(1 - \int_{0}^{t_{i, g}} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ - B (t_{i, g})), (1 - \int_{0}^{t_{i, h}} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ - B (t_{i, h}))] \\ = cov [1, (1 - \int_{0}^{t_{i, h}} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ - B (t_{i, h}))] \\ - cov [(\int_{0}^{t_{i, g}} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ + B (t_{i, g})), (1 - \int_{0}^{t_{i, h}} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ - B (t_{i, h}))] \\ = 0 - {cov [(\int_{0}^{t_{i, g}} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ + B (t_{i, g})), 1] \\ - cov [(\int_{0}^{t_{i, g}} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ + B (t_{i, g})), (\int_{0}^{t_{i, h}} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ + B (t_{i, h}))]} \\ = 0 - {0 - cov [(\int_{0}^{t_{i, g}} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ + B (t_{i, g}),) (\int_{0}^{t_{i, h}} \sum_{k = 1}^{l} β_{k} e^{\frac{- b}{p_{k} (τ)}} d τ + B (t_{i, h}))]} \\ = σ_{B}^{2} \cdot min {t_{i, g}, t_{i, h}} \end{matrix}

(24)

Let Y_i = (Y_i,1, …, Y_i,m) stand for the increased degradation, that is

Y_{i} (t_{i, g}) = H_{i} (t_{i, g}) - H_{i} (t_{i, g - 1}), g = 1, 2, \dots, m

(25)

And assume that Δt = t_i,g₋t_i,g₋₁ and Δv_i,g = v_i,g− v_i,g₋₁. Then we can get that

Y_{i} ~ N (μ_{i}, Σ_{i})

where $μ_{i} = β_{k} Δ v_{i}$ and $Σ_{i} = σ_{B}^{2} \cdot diag (Δ t_{i})$ .

In the above equation, $Δ v_{i} = (Δ v_{i, 1}, Δ v_{i, 2}, \dots, Δ v_{i, m})'$ , $Δ t_{i} = (Δ t_{i, 1}, Δ t_{i, 2}, \dots, Δ t_{i, m})'$ .

The likelihood formula for D which is the observation data

L (θ | D) = \prod_{i = 1}^{n} \frac{1}{{(2 π)}^{\frac{m}{2}} \sqrt{| Σ_{i} |}} \times \exp {- \frac{1}{2} (y_{i} - β_{k} Δ v_{i})^{'} Σ_{i}^{- 1} (y_{i} - β_{k} Δ v_{i})}

(26)

The relevant log-likelihood formula is

l (θ | D) = Const - nm \ln σ_{B} - \frac{1}{2 σ_{B}^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{{(y_{i, j} - β_{k} Δ v_{i, j})}^{2}}{Δ t_{i, j}}

(27)

The equation (27) is processed by differential approach

\frac{\partial}{\partial σ_{B}} l (θ | D) = - nm \frac{1}{σ_{B}} + \frac{1}{σ_{B}^{3}} \sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{{(y_{i, j} - β_{k} Δ v_{i, j})}^{2}}{Δ t_{i, j}}

(28)

and

\frac{\partial}{\partial β} l (θ | D) = \frac{1}{σ_{B}^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{Δ v_{i, j} (y_{i, j} - β_{k} Δ v_{i, j})}{Δ t_{i, j}}

(29)

We can get the restricted maximum likelihood estimations (RMLEs) through calculating the follow two formulas for σ_B and β_k

{\begin{matrix} {\hat{σ}}_{B}^{2} = \frac{1}{nm} \sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{{(y_{i, j} - {\hat{β}}_{k} Δ v_{i, j})}^{2}}{Δ t_{i, j}} \\ {\hat{β}}_{k} = \frac{[\sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{Δ v_{i, j} (k) y_{i, j}}{Δ t_{i, j}}]}{[\sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{Δ v_{i, j}^{2} (k)}{Δ t_{i, j}}]} \end{matrix}

(30)

We can get the profile log-likelihood that puts formula (30) into formula (27)

\begin{matrix} l (b_{k} | D) & = Const - nm \ln {\hat{σ}}_{B} - \frac{1}{2 {\hat{σ}}_{B}^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{{(y_{i, j} - {\hat{β}}_{k} Δ v_{i, j})}^{2}}{Δ t_{i, j}} \\ = Const - \frac{nm}{2} \ln \frac{1}{nm} \sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{{(y_{i, j} - {\hat{β}}_{k} Δ v_{i, j})}^{2}}{Δ t_{i, j}} \\ - \frac{1}{2 \frac{1}{nm} \sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{{(y_{i, j} - {\hat{β}}_{k} Δ v_{i, j})}^{2}}{Δ t_{i, j}}} \sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{{(y_{i, j} - {\hat{β}}_{k} Δ v_{i, j})}^{2}}{Δ t_{i, j}} \\ = Const - \frac{nm}{2} \ln \frac{1}{nm} \sum_{i = 1}^{n} \sum_{j = 1}^{m} \\ \frac{{y_{i, j} - Δ v_{i, j} \times [\sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{Δ v_{i, j} y_{i, j}}{Δ t_{i, j}}] / [\sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{Δ v_{i, j} y_{i, j}}{Δ t_{i, j}}] [\sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{Δ v_{i, j}^{2}}{Δ t_{i, j}}] [\sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{Δ v_{i, j}^{2}}{Δ t_{i, j}}]}^{2}}{Δ t_{i, j}} - \frac{nm}{2} \end{matrix}

(31)

From equation (31), we can get $\hat{b}$ by determining the maximize value of l(b_k|D) with different b in whole range. And then we can determine the ${\hat{σ}}_{B}$ and ${\hat{β}}_{k}$ by putting $\hat{b}$ into equation (30).

RUL-predicted result valuation

RUL-predicted result valuation is used to evaluate the output of the health index model. RUL-predicted result valuation is different from error or variance. We know that error and variance are just quantitative analyses, and they cannot reflect the weight of the predicted result. The difference between the pessimistic useful life (where the predicted RUL is shorter than real life) and the optimistic useful life (where the predicted RUL is longer than real life) is shown in Figure 4. This figure also shows the failure degradation that only considers a constant fault incident without a sudden fault incident. In turbofan engine health management, a short RUL can reduce the engine fault incidence rate but enhance maintenance costs; a long RUL can save maintenance costs, but engine breakdown may happen while the aircraft is flying. Therefore, evaluation of early prediction and late prediction is totally different.

Figure 4.

Degradation and prognostic score model.

In this study, RUL-predicted result valuation is the sum of the health index model predicted errors with different weights. We think that the weight of early prediction is higher than late prediction. The score equation for RUL-predicted result valuation is given in equation (32). From this equation, we can find that the lower the score is, the higher the predicted evaluation will be, which is shown in Figure 4

s = {\begin{matrix} \sum_{i = 1}^{n} e^{- (\frac{d}{a_{1}})} - 1 & for < d \\ \sum_{i = 1}^{n} e^{(\frac{d}{a_{2}})} - 1 & for \geq d \end{matrix}

(32)

where s represents the evaluation score, n is the number of sample, and d denotes the value of predicted RUL, subtract real RUL, a₁ = 10, and a₂ = 13.¹⁶

Case study

In order to demonstrate the KPCA-based degradation model of turbofan engine, we simulated the model using the data generated from the turbofan engine thermodynamic simulation. The data set is composed of engine operating data from a normal state to a failure state. Each engine is the same type and from the same factory, but the initial wear and product deviation and degradation rate may be different. Each engine works normally at the beginning, and some design fault degradation begins to appear at some time point, and then stops working when the engine breaks down. In this study, we chose 20 engine samples of operating data sets to demonstrate and simulate the RUL estimated model.

Figure 5 displays the total temperature at low pressure compressor (LPC) outlet from engine 1 to engine 5, and Figure 6 indicates the high pressure compressor (HPC) outlet temperature. The simulation result of the engine’s thermodynamic simulation for physical core speed is shown in Figure 7. We cannot see the degradation start time, and the fault appears clearly in the raw data. However, careful observation of Figures 5 –7 reveals that each sample can be regarded as increasing all the time cycles. In addition, when the physical core speed increases, the temperature of HPC and LPC may increase at the same time.

Figure 5.

The simulation result of C-MAPSS for total temperature at LPC.

Figure 6.

The simulation result of C-MAPSS for total temperature at HPC.

Figure 7.

The simulation result of C-MAPSS for physical core speed.

Data processing and KPCA analysis

Table 1 lists the Contr(p_k) and CPV of PCs from KPCA transformed. The number of PCs and the value of k will be determined through Table 1. Contr(p_k) represents the variance contribution and CPV represents the cumulative percent variance. The CL equals 85% as threshold generally, and the CPV of p₁, p₂, p₃ of all engines exceeds 93% in Table 1. CPV > CL is much in evidence, so the number of PCs is 3, that is to say k = 3. Figures 8 and 9 show the variation trend of KPCA-transformed feature p₁, p₂, p₃ of engine 1 and engine 3. The PCs cannot represent the temperature or the speed, but we can see the variation trend of the whole engine system from Figures 8 and 9. It is obvious that the cyan graph and yellow graph, which are not degradation trends, are different from the red graph in these two figures. The reason they are different is because the red graph, which is p₁, includes 71.44% of engine 1 gross information content and 98.05% of engine 3 gross information content in Table 1. The cyan graph and yellow graph change smoothly as p₂ and p₃ include little information.

Table 1.

The variance contribution and cumulative percent variance of PCs.

PCs	No.
	Engine 1		Engine 2		Engine 3		Engine 4		Engine 5
	Contr	CPV	Contr	CPV	Contr	CPV	Contr	CPV	Contr	CPV
P ₁	0.7144	0.7144	0.8839	0.8839	0.9805	0.9805	0.9812	0.9812	0.9448	0.9448
P ₂	0.1109	0.8254	0.0464	0.9303	0.0079	0.9884	0.0076	0.9887	0.0205	0.9654
P ₃	0.1054	0.9307	0.0385	0.9688	0.0062	0.9946	0.0066	0.9953	0.0193	0.9847
P ₄	0.0599	0.9906	0.0275	0.9963	0.0049	0.9994	0.0041	0.9994	0.0137	0.9984
P ₅	0.0063	0.9970	0.0026	0.9989	0.0004	0.9998	0.0004	0.9999	0.0011	0.9996
P ₆	0.0015	0.9985	0.0005	0.9994	0.0001	0.9999	0.0001	0.9999	0.0002	0.9997
P ₇	0.0008	0.9993	0.0003	0.9997	0.0000	1.0000	0.0000	1.0000	0.0001	0.9999
P ₈	0.0006	0.9998	0.0002	0.9999	0.0000	1.0000	0.0000	1.0000	0.0001	1.0000
P ₉	0.0001	0.9999	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000
P ₁₀	0.0001	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000
P ₁₁	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000
P ₁₂	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000
P ₁₃	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000
P ₁₄	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000
P ₁₅	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000	0.0000	1.0000

Figure 8.

The KPCA-transformed feature p₁, p₂, p₃ of engine 1.

Figure 9.

The KPCA-transformed feature p₁, p₂, p₃ of engine 3.

The parameters of health index modeling

From equation (31) we can determine that Figure 10 displays the profile log-likelihood function for b. The horizontal axis represents the value of b, and the vertical axis represents the profile likelihood function. We can obtain $\hat{b}$ by finding out the maximize value of l(b_k|D) in vertical axis with different b in whole range. Taking the KPCA-transformed feature p₁, p₂, p₃ as the covariate and fitting our health index model for the turbofan engine, we get the corresponding MLEs

\hat{b} = 0.6270, {\hat{β}}_{1} = 0.7491, {\hat{β}}_{2} = 3.1415 e - 015, {\hat{β}}_{3} = 2.3168 e - 015, {\hat{σ}}_{B}^{2} = 0.5790

Figure 10.

The profile log-likelihood function of b.

PCs trend modeling

Through sample analysis, we get the trend model of the regression curve line that fits the data set in which p(t) is plotted in red in Figure 11. The trend model is

p (t) = a + c_{1} \cos (ω t) + d_{1} \sin (ω t) + c_{2} \cos (2 ω t) + d_{2} \sin (2 ω t)

(33)

where a, c₁, c₂, d₁, d₂, and $ω$ are parameters to be estimated.

Figure 11.

The regression fitting of p(t).

To estimate parameter a, c₁, c₂, d₁, d₂, and $ω$ , we use the method that is regression fitting with additional parameters and we obtain the initial parameters

{\hat{a}}_{0} = - 12.17, {\hat{c}}_{10} = 17.49, {\hat{c}}_{20} = 2.885, {\hat{d}}_{10} = 18.15, {\hat{d}}_{20} = - 6.612, {\hat{ω}}_{0} = 0.01625

The confidence bounds of these coefficients are 95%.

To predict the future path line of p(t) process, we used the transformed initial data and the trend model to generate trend lines. The results of the calculations are displayed in Figure 12.

Figure 12.

PCs trend modeling of engine 1 to engine 20.

Predicting RUL of a turbofan engine

Next, we used a health index model specified in equation (20) to predict the state of the turbofan engine. By taking the result of PCs trend modeling and comparing its value of system health index (zero as the failure threshold), we can estimate the RUL of the turbofan engine. This predicted lifetime is shown as engine 1, and its distribution is plotted in Figure 13. The horizontal axis represents use over its lifetime in Figure 13, and the crossover point of the red/blue line and horizontal axis represent real/estimated RUL.

Figure 13.

Predicted remaining useful life.

Results and discussion

The predicted result of all 20 engines is shown in Figure 14. We use a scoring function in equation (32) to calculate the error and score of prognostics results, which is listed in Table 2. The lower the score of predicted results in every engine, the better the precision will be. The precision of the method can reach to 98.9% and the average precision is 95.8%. We can see from Table 2 that the score of half of the engines are less than 2. That is to say that the predicted result is very good and the health index model is suited for the prognostics problem.

Figure 14.

Prognostics results of the turbofan engine RUL.

Table 2.

The error and score of prognostics results.

Engine no.	Error	Score
1	−4	0.49
2	−37	39.44
3	−29	17.14
4	1	0.11
5	6	0.82
6	12	2.32
7	−9	1.46
8	1	0.11
9	19	5.69
10	−12	2.32
11	−10	1.72
12	−12	2.32
13	7	1.01
14	9	1.46
15	−7	1.01
16	−8	1.23
17	−11	2.00
18	−1	0.11
19	−1	0.11
20	−14	3.06

With the complicated engine system and the low availability of useful data, it is hard to predict the future state of the turbofan engine with high accuracy. However, we can find a way to use all of the limited data information that predicted the RUL. As for the problem of predicting the RUL of the engine, what we have been able to estimate is the distribution of the operating life under a constant degradation failure (not a sudden failure). To achieve this objective, we used the KPCA method to reduce the dimension of raw data and establish degradation model and health index equation to estimate the lifetime.

Conclusion

This article proposes a model for estimating the RUL of the turbofan engine and provides some unique contributions to the modeling and estimation in the engine prediction field. We assume that a commonly used equation for wear is provided and connect with the performance variation of parameters that enable us to assess cumulative degradation and predict the RUL of the engine. The KPCA analysis is used as the data preprocessing for the model to reduce the dimensions of the inputs and the outputs of KPCA analysis contains more than 90% information of the original data. The MLEs method is used in estimating the unknown parameters of a degradation model. From the result of case study, this method has a high precision in RUL estimation. The proposed KPCA-based degradation model integrates both the physical model and the data-driven model, allowing it to capture the features of different types of gas turbine engines even if they have different mechanical structures. The proposed model will reduce engine maintenance costs and increase the engine service life, in that we would know when and where to perform engine maintenance. Should the model be used in industrial areas, it will ensure the estimated engine product quality value in a short time. However, this work also contains some inadequacies that should be studied in the future. For example, we assume the health index model of the engine system is a constant fault, but in fact sudden fault in the KPCA-based degradation model and environmental factors should also be considered both in the performance estimation and in the prediction problem. Further investigation might focus on considering more factors in the KPCA-based degradation model that will closer resemble the real turbofan engine system. There is also a need to demonstrate and analyze the degradation model using real engine operating data and modify the model impend over a real system.

Footnotes

Appendix 1 Acknowledgements

The authors thank the anonymous reviewers for their critical and constructive comments. The authors are grateful to reviewers of Advances in Mechanical Engineering for precious suggestions and feedback for improving this article and to the editors for their meticulous processing.

Academic Editor: Hyung Hee Cho

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Guo

Study on the recognition of aero-engine blade-casing rubbing fault based on the casing vibration acceleration. Measurement 2015; 65: 71–80.

Nilkitsaranont

Gas turbine performance prognostic for condition-based maintenance. Appl Energ 2009; 86: 2152–2161.

Y-F

Huang

H-Z

Zhang

. Fuzzy sets method of reliability prediction and its application to a turbocharger of diesel engines. Adv Mech Eng 2013; 2013: Article ID 216192 (7 pp.).

Vanini

ZNS

Khorasani

Meskin

Fault detection and isolation of a dual spool gas turbine engine using dynamic neural networks and multiple model approach. Inform Sciences 2014; 259: 234–251.

Basir

Yuan

Engine fault diagnosis based on multi-sensor information fusion using Dempster–Shafer evidence theory. Inform Fusion 2007; 8: 379–386.

Chen

Zhang

Tang

H-L.

A probabilistic design methodology for a turboshaft engine overall performance analysis. Adv Mech Eng 2014; 2014: Article ID 976853 (12 pp.).

Lei

Lin

. A review on empirical mode decomposition in fault diagnosis of rotating machinery. Mech Syst Signal Pr 2013; 35: 108–126.

Surender

Ganguli

Adaptive myriad filter for improved gas turbine condition monitoring using transient data. J Eng Gas Turb Power 2005; 127: 329–339.

Diao

Passino

KM.

Fault diagnosis for a turbine engine. Control Eng Pract 2004; 12: 1151–1165.

10.

Sun

Cao

. A non-probabilistic metric derived from condition information for operational reliability assessment of aero-engines. IEEE T Reliab 2015; 64: 167–188.

11.

Yan

Robert

Gao

XC.

Wavelets for fault diagnosis of rotary machines: a review with applications. Signal Process 2014; 96: 1–15.

12.

Martha

Zaidan

Harrison

. Bayesian hierarchical models for aerospace gas turbine engine prognostics. Expert Syst Appl 2015; 42: 539–553.

13.

Baraldi

Mangili

Zio

Investigation of uncertainty treatment capability of model-based and data-driven prognostic methods using simulated data. Reliab Eng Syst Safe 2013; 112: 94–108.

14.

Kwon

Frangopol

DM.

Bridge fatigue reliability assessment using probability density functions of equivalent stress range based on field monitoring data. Int J Fatigue 2010; 32: 1221–1232.

15.

Jalan

Mohanty

AR.

Model based fault diagnosis of a rotor–bearing system for misalignment and unbalance under steady-state condition. J Sound Vib 2009; 327: 604–622.

16.

Saxena

Goebel

Simon

. Damage propagation modeling for aircraft engine run-to-failure simulation. In: International conference on PHM 2008, Denver, CO, 6–9 October 2008, pp.1–9. New York: IEEE.

17.

Meskin

Naderi

Khorasani

A multiple model-based approach for fault diagnosis of jet engines. IEEE T Contr Syst T 2013; 21: 254–262.

18.

Naderi

Meskin

Khorasani

Nonlinear fault diagnosis of jet engines by using a multiple model-based approach. J Eng Gas Turb Power 2012; 134: 011602.

19.

Wang

PHM-oriented integrated fusion prognostics for aircraft engines based on sensor data. IEEE Sens J 2014; 14: 1124–1132.

20.

Ying

Cao

. Study on gas turbine engine fault diagnostic approach with a hybrid of gray relation theory and gas-path analysis. Adv Mech Eng 2016; 8: 1–14.

21.

Nguyen

Golinval

JC.

Fault detection based on kernel principal component analysis. Eng Struct 2010; 32: 3683–3691.

22.

Kuang

Zhang

A novel hybrid KPCA and SVM with GA model for intrusion detection. Appl Soft Comput 2014; 18: 178–184.

23.

Zhang

Yang

SX.

An adaptive approach based on KPCA and SVM for real-time fault diagnosis of HVCBs. IEEE T Power Deliver 2011; 26: 1960–1971.

24.

Zhang

Monitoring of time-varying processes using kernel independent component analysis. Chem Eng Sci 2013; 88: 23–32.

25.

Kruger

Xie

. Adaptive KPCA modeling of nonlinear systems. IEEE T Signal Proces 2015; 63: 2364–2376.

26.

Xiao

Z-Z

Kim

L-S

. Optimization scheme of genetic algorithm and its application on aeroengine fault diagnosis. Int J Precis Eng Man 2015; 16: 735–741.

27.

Maio

Tsui

Zio

Combining relevance vector machines and exponential regression for bearing residual life estimation. Mech Syst Signal Pr 2012; 31: 405–427.

28.

Zio

Maio

FD.

A data-driven fuzzy approach for predicting the remaining useful life in dynamic failure scenarios of a nuclear system. Reliab Eng Syst Safe 2010; 95: 49–57.

29.

Cai

Chen

. Operation reliability assessment for cutting tools by applying a proportional covariate model to condition monitoring information. Sensors 2012; 12: 12964–12987.

30.

Tran

Pham

Yang

. Machine performance degradation assessment and remaining useful life prediction using proportional hazard model and support vector machine. Mech Syst Signal Pr 2012; 32: 320–330.

31.

Qiu

Lee

Lin

Wavelet filter-based weak signature detection method and its application on roller bearing prognostics. J Sound Vib 2006; 289: 1066–1090.

32.

Lee

Zhao

. Prognostics and health management design for rotary machinery systems—reviews, methodology and applications. Mech Syst Signal Pr 2014; 42: 314–334.

33.

Jin

Matthews

Fan

. Physics of failure-based degradation modeling and lifetime prediction of the momentum wheel in a dynamic covariate environment. Eng Fail Anal 2013; 28: 222–240.