Sage Journals: Discover world-class research

Abstract

Addressing the multi-stage characteristics of rolling bearing degradation with random change points, this paper proposes a novel method for predicting the Remaining Useful Life (RUL) of multi-stage degradation processes. Initially, the prior parameters of each stage model are estimated using offline historical data. Then, for a single online device, real-time change point detection is performed using the Bayesian change point detection method. The Bayesian updating approach is adopted to update the parameters of the first stage before the change point occurs and the second stage after the change point. Subsequently, the multi-stage model is utilized for RUL prediction. Numerical simulations and case studies have shown that the rolling bearing life prediction method based on Bayesian change point detection can improve change point detection accuracy by 85%, thereby achieving high-precision multi-stage RUL prediction.

Keywords

remaining useful life prediction rolling bearing Bayesian change point detection random degradation equipment

Introduction

Rolling bearings, as the most common and extremely crucial rotating components in industrial equipment, are widely applied in rotating machinery such as motors, gearboxes, and turbines. Whether their operation status is normal directly affects the performance of the entire equipment.^1,2 During the operation process, due to the influence of internal factors (such as sudden changes in the degradation mechanism) or external factors (changes in load, working conditions, etc.), the degradation characteristics of rolling bearings also change accordingly, thereby multi-stage characteristics often occur.^3,4 As shown in Figure 1, three typical stages are clearly observed during the operation of this bearing: early stage—slow degradation period, middle stage—moderate degradation period, and late stage—rapid degradation period.

Figure 1.

The three-stage degradation process of rolling bearing.

The Wiener process is a type of diffusion process with a linear drift term. Due to its excellent mathematical properties, it has been widely used in equipment reliability prediction and life analysis since the 1990s.^5,6 In recent years, continuous new progress has been made. Peng and Tseng⁷ established a cumulative degradation model, studied the degradation modeling and prediction model within a linear framework, and derived the explicit solution of the life distribution. Lee and Tang⁸ proposed an improved EM algorithm to estimate the parameters in the degradation process. Joseph and Yu⁹ conducted research on the nonlinear degradation model, which can be linearized under certain conditions, and they also studied the degradation modeling problem considering the data correlation. However, all the above methods utilize the historical monitoring data of the service lives of similar equipment and do not make use of the monitoring degradation data during the operation of the in-service equipment. In order to overcome this problem, Gebraeel et al.¹⁰ proposed a stochastic degradation modeling method under the Bayesian framework. After obtaining the degradation monitoring data, the Bayesian method is used to update the stochastic parameters of the model, obtain their posterior estimates, and then predict the probability distribution of the remaining useful life. Shi and Xue¹¹ proposed a method for predicting the remaining useful life that integrates the prior degradation data and the on-site degradation data of the equipment itself, established an equipment degradation model that conforms to the description of the nonlinear Wiener process.

Currently, there have been many studies on the degradation modeling and life prediction of equipment with typical multi-stage degradation characteristics.^12,13 Ng¹⁴ proposed a two-stage independent increment degradation model with independent increments based on a single change point. Wang et al.¹⁵ proposed a two-stage bearing degradation modeling combining the EM algorithm and Kalman filter for adaptive prediction. Dong et al.¹⁶ proposed a method for predicting the Remaining Useful Life (RUL) based on a two-stage adaptive Wiener process. For the degradation modeling of multi-stage processes, the key lies in accurately detecting the points where the degradation rules change, that is, the change points.¹⁶ For the typical three-stage bearing degradation process, generally, the time point of the occurrence of the second stage is taken as the initial fault point, which is also called the First Prediction Time (FPT), and the degradation process modeling is carried out with the subsequent two stages (the middle and late stages) as the targets. Figure 2 shows the typical two-stage degradation process of a rolling bearing. Under the condition of determining the failure threshold, its real life is point A in the figure. At present, the main problems existing in the real-time online life prediction of the multi-stage degradation process are as follows:

Figure 2.

Error analysis of RUL based on two-stage degradation process.

If the change points in the degradation process are not taken into account, that is, the process is regarded as a single process, it will lead to a very large prediction error. For example, in Figure 2, the predicted life at point A1 has a huge gap from the real life at point A.

When considering the change points and treating the process as a two-stage process, the prediction accuracy of the change points is very important. For example, in Figure 2, owing to the prediction error of the change points, the estimated life prediction point is located at point A2, resulting in a large error.

At the same time, the current multi-stage degradation modeling and prediction are not suitable for online life prediction, real-time change point detection and real-time remaining useful life prediction cannot be carried out.

Statistical change point analysis theory, which focuses on modeling abrupt changes in real-world processes, is a nonlinear statistical framework that has witnessed substantial advancements in both theoretical development and practical applications in recent years.^13,15,16

Current mainstream methodologies for change point detection include the Pettitt method,¹⁷ Bayesian testing approaches,¹⁸ and the cumulative sum (CUSUM) test.¹⁹ The Pettitt, a nonparametric approach for change-point detection, examines the relative ranks of all pairwise comparisons within the time series. This distribution-free method is particularly valuable as it requires no prior knowledge of the underlying data distribution and maintains robustness across different time series types. However, the Pettitt test exhibits certain drawbacks, including reduced accuracy in identifying multiple change-points and heightened sensitivity to both the time series length and the position of the change point(s).¹⁷ The CUSUM test, on the other hand, requires a large dataset to construct test statistics, which limits its accuracy when applied to small sample sizes. In contrast, Bayesian change point detection methods, under appropriate prior assumptions, can effectively identify change points with relatively high precision while meeting the requirements for real-time detection capabilities.

In response to the above analysis, this paper proposes a novel adaptive change point detection-based RUL prediction method for bearing degradation models. First, parameter learning is conducted using historical data to determine the prior values of change point distribution patterns, followed by the estimation of drift and diffusion coefficients for different stages. By applying change point detection to the collected degradation observation data, the observation sequence is partitioned into multiple segments. Starting from the FPT, online life prediction is performed. Both simulation experiments and real-world case studies validate the prediction accuracy and practical applicability of the proposed method.

Problem description

Let $X (t)$ represent the performance degradation amount of a rolling bearing at time $t$ . If a change point $τ$ exists in the degradation process, $x_{0}$ denotes the initial value of the degradation process. Divide the degradation sequence into two different stages $x_{0}^{τ}$ , $x_{τ + 1}^{n}$ with the change point $τ$ , and the corresponding degradation parameters are $θ^{(1)}$ and $θ^{(2)}$ (the specific parameters depend on the established model). Let the failure threshold be $ω$ , and the equipment life $T_{ω}$ ²⁰:

T_{ω} = \inf {t : X (t) \geq ω | x_{0} < ω}

(1)

Let $t$ denote the current time, and let $x_{0}^{k}$ represent the acquired real-time degradation sequence. Consequently, the remaining life corresponding to time $t_{k}$ can be denoted as:

L_{k} = \inf {l_{k} : X (l_{k} + t_{k}) \geq ω | X (t_{k}) < ω}

(2)

The tasks of real-time online life prediction are: Estimate the current change-point $τ$ and the model parameters $θ^{(1)}$ and $θ^{(2)}$ before and after the change-point in real-time; Considering two cases of $t_{k} \leq τ$ and $t_{k} > τ$ , estimate the remaining life $l_{k}$ of the current time $t_{k}$ .

Two-stage degradation process model

For the typical three-stage rolling bearing degradation process, since the first stage operates smoothly, remaining life prediction is generally not performed. This paper establishes a two-stage degradation process model focusing on the latter two stages of rolling bearings. The following assumptions are made:

The rolling bearing degradation process starts from the FPT, with all prediction objects having two typical degradation stages, namely the moderate degradation stage (Stage 1) and the rapid degradation stage (Stage 2); Each stage is a linear Wiener process.

Two-stage Wiener linear model

Based on the above assumptions, the studied degradation process model is expressed as:

X (t) = {\begin{matrix} x_{0} + μ_{1} t + σ_{1} B (t) \\ x_{τ} + μ_{2} (t - τ) + σ_{2} B (t - τ) \end{matrix} \begin{matrix} 0 < t \leq τ \\ τ < t \end{matrix}

(3)

In the formula, $μ = [μ_{1}, μ_{2}], σ = [σ_{1}, σ_{2}]$ represents the drift coefficient and diffusion coefficient for stages 1 and 2, respectively. $τ$ and $x_{τ}$ represent the occurrence time of the change points and the corresponding degradation amounts, respectively. $x_{0}$ is the initial deterioration amount of the device, which is generally set to 0, and $B (t)$ represents standard Brownian motion. The randomness of Brownian motion is used to reflect the uncertainty in the prediction of remaining life.

The RUL distribution of the two-stage degradation process

Based on the above assumptions, the RUL Probability Density Function (PDF) of the two-stage degradation process model studied in this paper can be expressed as²⁰:

f (t) = {\begin{matrix} \frac{ω - x_{0}}{\sqrt{2 π t^{3} σ_{1}^{2}}} \exp (A_{0}) \\ \int_{- \infty}^{ω} \frac{ω - x_{τ}}{\sqrt{2 π {(t - τ)}^{3} σ_{2}^{2}}} \exp (B_{0}) g_{τ} (x_{τ}) d x_{τ} \end{matrix} \begin{matrix} 0 < t \leq τ \\ τ < t \end{matrix}

(4)

Where $A_{0} = - \frac{{(ω - x_{0} - μ_{1} t)}^{2}}{2 t σ_{1}^{2}}$ , $B_{0} = - \frac{{(ω - x_{τ} - μ_{2} (t - τ))}^{2}}{2 (t - τ_{1}) σ_{2}^{2}}$ , $g_{τ} (x_{τ})$ represent the transition probability²⁰ of transitioning from 0 to $x_{τ}$ over time $τ$ , and its expression is:

\begin{matrix} g_{τ} (x_{τ}) = \frac{1}{\sqrt{2 π τ σ_{1}^{2}}} {\exp [- \frac{{(x_{τ} - μ_{1} τ)}^{2}}{2 τ σ_{1}^{2}}] \end{matrix} - \exp (\frac{2 μ_{1} ω}{σ_{1}^{2}}) \exp [- \frac{{(x_{τ} - 2 ω - μ_{1} τ)}^{2}}{2 τ σ_{1}^{2}}]}

(5)

The PDF at time $t_{k}$ is²⁰:

f_{L_{k}} (l_{k}) = {\begin{matrix} \frac{ω - x_{k}}{\sqrt{2 π l_{k}^{3} σ_{1}^{2}}} \exp {- \frac{{(ω - x_{k} - μ_{1} l_{k})}^{2}}{2 l_{k} σ_{1}^{2}}}, \\ A_{1} - B_{1}, \end{matrix} \begin{matrix} 0 < t_{k} + l_{k} \leq τ \\ t_{k} + l_{k} > τ \end{matrix}

(6)

Where

\begin{matrix} A_{1} = \sqrt{\frac{1}{2 π {(l_{k} - τ + t_{k})}^{2} (σ_{a}^{2} + σ_{b}^{2})}} \exp (- \frac{{(μ_{a} + μ_{b})}^{2}}{2 (σ_{a}^{2} + σ_{b}^{2})}) \\ \times {\frac{μ_{b} σ_{a}^{2} + μ_{a} σ_{b}^{2}}{(σ_{a}^{2} + σ_{b}^{2})} Φ (\frac{μ_{a} σ_{b}^{2} + μ_{b} σ_{a}^{2}}{\sqrt{σ_{a}^{2} σ_{b}^{2} (σ_{a}^{2} + σ_{b}^{2})}}) \\ + \sqrt{\frac{σ_{a}^{2} σ_{b}^{2}}{(σ_{a}^{2} + σ_{b}^{2})}} ϕ (\frac{μ_{a 1} σ_{b 1}^{2} + μ_{b} σ_{a}^{2}}{\sqrt{σ_{a}^{2} σ_{b}^{2} (σ_{a}^{2} + σ_{b}^{2})}})} \end{matrix}

(7)

\begin{array}{l} B_{1} = e x p {\frac{2 μ_{1 P} (ω - x_{k})}{σ_{1}^{2}} + \frac{2 [{(ω - x_{k})}^{2} σ_{1 P}^{4} τ + {(ω - x_{k})}^{2} σ_{1 P}^{2} σ_{1}^{2}]}{[σ_{1}^{2} + (τ - l_{k}) σ_{1 P}^{2}] σ_{1}^{4}}} \\ \times \frac{e x p (- 2 \frac{{(μ_{a} - μ_{b})}^{2}}{(σ_{a}^{2} + σ_{b}^{2})})}{2 π {(l_{k} - τ + t_{k})}^{2} (σ_{a}^{2} + σ_{b}^{2})} \\ \times {\frac{μ_{c} σ_{a}^{2} + μ_{a} σ_{b}^{2}}{(σ_{a}^{2} + σ_{b}^{2})} Φ (\frac{μ_{a} σ_{b}^{2} + μ_{b} σ_{a}^{2}}{\sqrt{σ_{a}^{2} σ_{b}^{2} (σ_{a}^{2} + σ_{b}^{2})}}) + \sqrt{\frac{σ_{a}^{2} σ_{b}^{2}}{(σ_{a}^{2} + σ_{b}^{2})}} ϕ (\frac{μ_{c} σ_{a}^{2} + μ_{a} σ_{b}^{2}}{\sqrt{σ_{a}^{2} σ_{b}^{2} (σ_{a}^{2} + σ_{b}^{2})}})} \end{array}

(8)

$Φ (\cdot)$ is the cumulative distribution function of the standard normal distribution, $ϕ (\cdot)$ is the probability density function of the standard normal distribution.

Where $μ_{a} = μ_{2 P} (l_{k} - τ + t_{k})$ , $μ_{b} = ω - x_{k} - μ_{1 P} (τ - t_{k})$ , $μ_{c} = - ω + x_{k} - μ_{1 P} (τ - t_{k}) - 2 ω (τ - l_{k}) σ_{1 P}^{2} / σ_{1}^{2}$ , $σ_{a}^{2} = σ_{2}^{2} (l_{k} - τ + t_{k}) + σ_{2 P}^{2} (l_{k} - τ + t_{k})^{2}, σ_{b}^{2} = σ_{1}^{2} (τ - t_{k}) + σ_{1 P}^{2} (τ - t_{k})^{2}$ .

Offline parameter estimation

Considering that there are a total of $n$ rolling bearings in the same batch, there are $n$ sets of degradation data, namely $X = [x_{1}, x_{2}, \dots, x_{n}]$ , where $x_{i} = [x_{i, 0}, x_{i, 1}, x_{i, 2}, \dots, x_{i, m_{i}}]$ represents the monitoring value of the $i$ -th bearing at time $[t_{i, 0}, t_{i, 1}, t_{i, 2}, \dots, t_{i, m_{i}}]$ , and $m_{i}$ is the number of monitoring values for the $i$ -th bearing. Let different bearings collect their degradation data at equal intervals, namely $Δ t = t_{i, j} - t_{i, j - 1}$ . Assume that the parameters $μ_{1}$ and $μ_{2}$ of the two stages follow Gaussian distributions $N (μ_{1 P}, σ_{1 P}^{2})$ and $N (μ_{2 P}, σ_{2 P}^{2})$ , and the change point $τ_{i}$ follows the $Gamm (α, β)$ distribution, with shape parameter and scale parameter being $α$ and $β$ respectively.²⁰

Using MLE (Maximum Likelihood Estimation) to estimate its parameters, first let $τ_{i}$ be the change point of device $i$ , and construct the likelihood function as shown in equation (9):

\begin{matrix} lnL (μ_{1, i}, σ_{1}, μ_{2, i}, σ_{2}, τ_{i} | x_{i}) \\ = \sum_{j = 1}^{τ_{i}} \ln \frac{1}{σ_{1} \sqrt{2 π Δ t}} \exp [- \frac{{(x_{i, j} - x_{i, j - 1} - μ_{1, i} Δ t)}^{2}}{2 Δ t σ_{1}^{2}}] \\ + \sum_{j = τ_{i} + 1}^{m_{i}} \ln \frac{1}{σ_{2} \sqrt{2 π Δ t}} \exp [- \frac{{(x_{i, j} - x_{i, j - 1} - μ_{2, i} Δ t)}^{2}}{2 Δ t σ_{2}^{2}}] \end{matrix}

(9)

The maximum likelihood estimate of parameter ${\hat{Θ}}_{i} = ({\hat{μ}}_{1, i}, {\hat{μ}}_{2, i}, {\hat{σ}}_{1}^{2}, {\hat{σ}}_{2}^{2}, {\hat{τ}}_{i}), i = 1, 2, \dots, n$ is²⁰:

{\hat{Θ}}_{i} = \underset{Θ}{argmax} lnL (μ_{1, i}, σ_{1}, μ_{1, i}, σ_{2}, τ_{i} | x_{i})

(10)

Using the values of ${\hat{θ}}_{i}$ from multiple devices, the estimate of $σ_{1}, σ_{2}, μ_{1 P}, μ_{2 P}, τ$ can be obtained.

The PDF of degradation increments

For the standard Wiener process $x_{0}^{n} = [x_{0}, x_{1}, x_{2}, \dots, x_{n}]$ , define its increment sequences, $Δ x_{1}^{n} = [Δ x_{1}, Δ x_{2}, \dots, Δ x_{n}], Δ x_{i} = x_{i} - x_{i - 1}$ , $Δ t_{i} = t_{i} - t_{i - 1}, i = 1, 2, \dots n$ . $Δ x_{i}$ and $Δ t_{i}$ are the performance degradation increments and time increments, respectively. It can be inferred from the normality of the Wiener process that $Δ x_{i} ~ N (μ Δ t_{i}, σ^{2} Δ t_{i})$ .

Then PDF of $Δ x_{i}$ is:

f (Δ x_{i} | μ, σ) = \frac{1}{σ \sqrt{2 π Δ t_{i}}} \exp (- \frac{{(Δ x_{i} - μ Δ t_{i})}^{2}}{2 Δ t_{i} σ^{2}})

(11)

Here, the prior distribution of $θ = (μ, σ^{2})$ is a normal-inverse gamma distribution,²¹ that is, $θ ~ N / IGa (m, v, a, B)$ .

If the hyperparameters (m, v, a, B) are defined, then:

\begin{matrix} p (θ | m, v, a, B) = \exp {- \frac{v μ^{2}}{2 σ^{2}} + \frac{vm μ}{σ^{2}} - \frac{B}{σ^{2}} + \frac{v m^{2}}{σ^{2}} \end{matrix} + (a - \frac{1}{2}) \ln \frac{1}{σ^{2}} - \ln Z_{NIG} (η)}

(12)

where $Z_{NIG} (η) = {(2 π / v)}^{1 / 2} {| B |}^{- a} Γ (a)$ , $Γ (a)$ is the gamma function.

Bayesian change point detection

Bayesian change point detection is an online change point detection algorithm,¹⁸ determining whether a change point $t_{c}$ has occurred by generating an accurate distribution of the next unseen data in the sequence from the observed data. The increment degradation sequence $Δ x_{1}^{n}$ introduced in Section “Two-stage degradation process model” is used for change point detection.

Change point detection method

Bayesian change point detection models the run length $r_{t}$ starting from the last change point,¹⁸ $r_{t} = [r_{1 t}, r_{2 t}, \dots r_{kt}, \dots r_{tt},]^{T}$ ., with $1 \leq k \leq t, r_{kt} = 1$ indicating the moment $k$ as a change point. Given that the run length for any moment (t−1) is $r_{t - 1}$ , the run length $r_{t}$ at the next moment $t$ can take only two values: when the sequence continues to grow according to the original distribution, the length of the sequence $r_{t} = r_{t - 1} + 1$ ; when a change point occurs at moment $t$ , $r_{t} = 1$ .

The probability that the degradation pattern of the device changes (i.e. a change point $r_{t} = 1$ occurs) is $H (r_{t})$ , and when the duration of the degradation pattern follows a certain distribution, $H (r_{t})$ is the hazard function of that distribution, then¹⁸:

p (r_{t} ∣ r_{t - 1}) = {\begin{matrix} 1 - H (r_{t - 1} + 1), & r_{t} = 1 \\ H (r_{t - 1} + 1), & r_{t} = r_{t - 1} + 1 \\ 0, & other \end{matrix}

(13)

Among them, $H (τ) = \frac{P_{qap} (g = τ)}{\sum_{t = τ}^{\infty} P_{qap} (g = t)}$ is its risk function,¹⁸ in the special case is where $P_{qap} (g)$ is a discrete exponential distribution with timescale $λ$ , the process is memoryless and the hazard function is constant at $H (τ) = 1 / λ$ .

According to the given variable length $r_{t}$ , the marginal probability summation shows:

p (Δ x_{t + 1} ∣ Δ x_{1}^{t}) = \sum_{r_{t}} p (Δ x_{t + 1} ∣ r_{t}, Δ x_{k}^{t}) p (r_{t} ∣ Δ x_{1}^{t})

(14)

Here, $Δ x_{k}^{t}$ represents the nearest $(r_{t} = t - k + 1)$ associated observation points $Δ x_{k}, Δ x_{k + 1}, \dots Δ x_{t - 1}, Δ x_{t}$ at time $t$ . The posterior probability $p (r_{t} ∣ Δ x_{1}^{t})$ here is:

p (r_{t} ∣ Δ x_{1}^{t}) = \frac{p (r_{t}, Δ x_{1}^{t})}{p (Δ x_{1}^{t})}

(15)

The joint probability $p (r_{t}, Δ x_{1}^{t})$ can be expressed in a recursive form:

\begin{matrix} p (r_{t}, Δ x_{1}^{t}) = \sum_{r_{t - 1}} p (r_{t}, r_{t - 1}, Δ x_{1}^{t}) \\ = \sum_{r_{t - 1}} p (r_{t} ∣ r_{t - 1}) p (Δ x_{t} ∣ r_{t - 1}, Δ x_{k}^{t}) p (r_{t - 1}, Δ x_{1}^{t - 1}) \end{matrix}

(16)

From equations (20). and (21), we obtain the recursive formula for posterior probability of $p (r_{t} ∣ Δ x_{1}^{t})$ ¹⁸:

p (r_{t} ∣ Δ x_{1}^{t}) = \sum_{r_{t - 1}} p (Δ x_{t} ∣ r_{t}, Δ x_{k}^{t}) p (r_{t} ∣ r_{t - 1}) p (r_{t - 1} ∣ Δ x_{1}^{t - 1})

(17)

Note that, according to the properties of conditional independence, the predictive distribution $p (Δ x_{t} ∣ r_{t - 1}, Δ x_{1}^{t})$ depends only on the most recent data $Δ x_{k}^{t}$ , and uses the properties of the conjugate exponential distribution family for simplified calculations.

According to equation (22), the current time $τ$ corresponds to the change point $t_{C}^{(τ)}$ ²¹:

t_{C}^{(τ)} = \underset{k}{argmax} {p (r_{kt} ∣ Δ x_{1}^{t}), 1 \leq k \leq τ}

(18)

Conjugate exponential distribution

For the convenience of calculation, assume that the degenerate increment $Δ x_{i}$ is generated by an exponential distribution family with a conjugate prior distribution.¹⁸ The prior distribution $p (θ ∣ η)$ and posterior distribution $p (θ ∣ Δ x_{i})$ of parameter $θ$ have the same form, and the hyperparameters satisfy the following relationship.

η_{post} = η + \sum_{t = 1}^{n} u (Δ x_{i})

(19)

The likelihood function form of the exponential distribution is:

p (Δ x_{i} ∣ η) = \int p (Δ x_{i} ∣ θ) p (θ ∣ η) d θ

(20)

The calculated marginal distribution is a $T$ distribution, that is: $Δ x_{i} ∣ η ~ T (Δ x_{i} ∣ m, (v + 1) B / va, 2 a)$ Taking $φ (θ) = [- u^{2} / 2 σ^{2}, u / σ^{2}, 1 / σ^{2}, - \log σ^{2}], η = [v, vm, B + v m^{2} / 2, a - 1 / 2], φ (θ)$ and $η$ as the sufficient statistics and hyperparameters respectively, and $u (Δ x_{i}) = [1, Δ x_{i}, Δ {x_{i}}^{2} / 2, 1 / 2]$ . The recursive formula for the posterior hyperparameter is obtained as:

{\begin{matrix} v_{t + 1} = v_{t} + 1, & B_{t + 1} = B_{t} + \frac{v_{t} {(m_{t} - Δ x_{t + 1})}^{2}}{2 (v_{t} + 1)} \\ a_{t + 1} = a_{t} + \frac{1}{2}, & m_{t + 1} = \frac{v_{t} m_{t} + Δ x_{t + 1}}{v_{t} + 1} \end{matrix}

(21)

Thus, the recursive formula for the predicted distribution probability $p (Δ x_{t} ∣ η_{t - 1}^{(k)})$ can be obtained:

\begin{matrix} p (Δ x_{t} ∣ η_{t - 1}^{(k)}) ~ T (Δ x_{t} ∣ m_{t - 1}^{(k)}, (v_{t - 1}^{(k)} + 1) B_{t - 1}^{(k)} / v_{t - 1}^{(k)} a_{t - 1}^{(k)}, 2 a_{t - 1}^{(k)}) \end{matrix}

(22)

In the equation, $η_{t - 1}^{(k)}$ represents the posterior distribution hyperparameters calculated based on $Δ x_{k}^{t}$ .

Bayesian change point detection algorithm

Algorithm 1: Bayesian Change Point Detection Algorithm:

Step 1: Initialize $p (r_{1}) = 1$ , determine its offline parameter estimation using equation (10) based on offline data, and use the offline parameter to estimate its hyperparameters value $m_{0}, v, a_{0}, B_{0}$ ;

Step 2: Obtain a new set of degraded incremental data $Δ x_{t}$ ;

Step 3: Calculate the predicted distribution probability $π_{t}^{(r_{t})} = p (Δ x_{t} ∣ η_{t}^{(k)})$ for $Δ x_{t}$ according to equation (22);

Step 4: Calculate the growth probability $p (r_{t} = r_{t - 1} + 1, Δ x_{1}^{t})$ for $(r_{t} = r_{t - 1} + 1)$ according to equation (17);

Step 5: Calculate the change point probability $p (r_{t} = 1, Δ x_{1}^{t})$ for $(r_{t} = 1)$ according to equation (17);

Step 6: Calculate the evidence $p (Δ x_{1}^{t}) = \sum_{r_{t}} p (r_{t}, Δ x_{1}^{t})$ using integration according to equation (14);

Step 7: Determine the posterior distribution $p (r_{t} ∣ Δ x_{1}^{t}) = p (r_{t}, Δ x_{1}^{t}) / p (Δ x_{1}^{t})$ of runtime $r_{t}$ ;

Step 8: Determine the change point according to equation (18) $t_{C}^{(t)}$ ;

Step 9: Update the hyperparameters according to equation (21), switch to Step 2.

Bayesian change point detection based RUL prediction

RUL prediction flowchart

Figure 3 illustrates the flowchart of the Bayesian change point detection-based RUL prediction method for rolling bearings. The process begins with offline historical data from multiple homogeneous bearings to construct a multi-stage degradation model, where prior parameters are estimated via maximum likelihood estimation (MLE).

Figure 3.

Flowchart of Bayesian change point detection-based RUL prediction.

For online prediction, real-time monitoring data from a target bearing is used to compute degradement increments. The Bayesian framework then evaluates whether a change point (e.g. degradation stage transition) occurs by calculating posterior probabilities and separating growth and change point probabilities. If detected, model parameters are updated in real-time. Finally, the RUL is predicted as a PDF based on the first passage time of the degradation trajectory crossing a failure threshold. This approach integrates offline learning with online adaptation, enhancing RUL prediction accuracy for rolling bearings.

Online parameter update

For real-time RUL prediction of a single rolling bearing, to fully utilize the online collected data, Bayesian theory can be employed for online parameter updating, with the bearing monitoring at the current time $t_{k}$ , and the collected data from time 1 to $t_{k}$ being $x_{1}^{k}$ .

Let $Θ_{0} = [μ_{1, 0}, σ_{1, 0}, μ_{2, 0}, σ_{2, 0}, τ]$ be the prior information learned offline. Let the current device change point be $τ$ . When $t_{k} < τ$ , it indicates that the device is in the first degradation stage, and there is no degradation data for the second stage yet, thus the model parameters for the first stage $(μ_{1 P, 0}, σ_{1 P, 0})$ can be updated at this point. If $t_{k} > τ$ , then the change point has occurred, requiring only the update of the second stage parameters $(μ_{2 P}, σ_{2 P})$ .

When $t_{k} < τ, Θ ~ N (μ_{1 P, 0}, σ_{1 P, 0}^{2})$ , the posterior estimation of $Θ, p (Θ ∣ x_{1}^{k})$ is computed as follows:

p (μ_{1} ∣ x_{1}^{k}) = \frac{1}{σ_{1 P, 0} \sqrt{2 π}} \exp {\frac{{(μ_{1} - μ_{1 P})}^{2}}{2 σ_{1 P}^{2}}}

(23)

Where,

{\begin{matrix} μ_{1 P} = \frac{μ_{1 P, 0} σ_{1}^{2} + (x_{k} - x_{0}) σ_{1 P, 0}^{2}}{(t_{k} - t_{0}) σ_{1 P, 0}^{2} + σ_{1}^{2}} \\ σ_{1 P}^{2} = \frac{σ_{1}^{2} σ_{1 P, 0}^{2}}{(t_{k} - t_{0}) σ_{1 P, 0}^{2} + σ_{1}^{2}} \end{matrix}

(24)

When $t_{k} > τ$ , only the parameters of $p (μ_{2} ∣ x_{1}^{k})$ need to be updated²⁰:

p (μ_{2} ∣ x_{τ}^{k}) = \frac{1}{σ_{2 P, 0} \sqrt{2 π}} \exp {\frac{{(μ_{2} - μ_{2 P})}^{2}}{2 σ_{2 P}^{2}}}

(25)

Where

{\begin{matrix} μ_{2 P} = \frac{μ_{2 P, 0} σ_{2}^{2} + (x_{k} - x_{τ}) σ_{2 P, 0}^{2}}{(t_{k} - t_{τ}) σ_{2 P, 0}^{2} + σ_{2}^{2}} \\ σ_{2 P}^{2} = \frac{σ_{2}^{2} σ_{2 P, 0}^{2}}{(t_{k} - t_{τ}) σ_{2 P, 0}^{2} + σ_{2}^{2}} \end{matrix}

(26)

RUL prediction algorithm

Algorithm 2: Multi-stage Degradation RUL Prediction Algorithm:

Step 1: Based on multiple sets of historical data, use equation (10) to determine the prior distribution of the hyperparameters where the common parameters (including change point $τ$ ) have a distribution of $σ_{1}, σ_{2}, μ_{1 P}, μ_{2 P}, τ$ as $Gamm (α, β)$ .

Step 2: For a specific degrading device, observe the latest data $x_{k}$ at the current time $t_{k}$ . All observation time series are $x_{0}^{k}$ ; construct the differential time series $Δ x_{1}^{k}$ .

Step 3: Use Algorithm 1 to calculate $p (r_{t} ∣ Δ x_{1}^{t})$ , determine whether the current time $t_{k}$ is the latest change point, and process according to the change point:

If there are no change points, handle it in two cases:

Case 1: When $t_{k} < τ$ , take the prior information expected value $E τ$ as the change point, which is $α$ ;

Case 2: When $t_{k} \geq τ$ ,update the change point to $t_{k}$ , and use formula (24) to perform the first phase parameter update with the $x_{1}^{k}$ sequence, calculating the first phase parameters $μ_{1 P}$ and $σ_{1}$ .

If a change point occurs at the current time $t_{k}$ , proceed to Step 4.

Step 4: Use the latest change point $t_{C}^{(t)}$ as the new starting point of the degradation trend, and update the stage 2 model parameters $(μ_{2 P}, σ_{2})$ using formula (26).

Step 5: According to formula (25), use all obtained parameters to calculate the remaining life PDF of the device.

Step 6: Return to Step 2 until the prediction process is complete.

Simulation validation

The numerical simulation model is as shown in equation (3), where $μ_{1}, μ_{2}, σ_{1}, σ_{2}$ represents the drift coefficient and diffusion coefficient of stage 1, 2, and $τ$ is the random change point.

During numerical simulation, consider that the drift coefficient exhibits random effects, with parameters $μ_{1}$ and $μ_{2}$ following Gaussian distributions $N (μ_{1 P}, σ_{1 P}^{2})$ and $N (μ_{2 P}, σ_{2 P}^{2})$ . The true parameters at different stages are shown in Table 1, and the change point $τ$ also has random effects, following a $Gamm (α, β)$ distribution, and $α = 150, β = 1$ . The diffusion coefficient is a constant value of $σ_{1} = 0.5, σ_{2} = 0.8$ , generally speaking, the second stage has greater uncertainty compared to the first stage. Set the degradation threshold to $ω = 200$ .

Table 1.

Offline parameter estimation results.

Parameter	True value	Estimated value	Parameter	True value	Estimated value
$μ_{1 P}$	0.80	0.81	$σ_{2 P}$	0.12	0.11
$σ_{1 P}$	0.12	0.14	$σ_{2}$	0.80	0.78
$σ_{1}$	0.50	0.42	$α$	150	148.2
$μ_{2 P}$	2.50	2.31	$β$	1	1.03

Figure 4 shows the simulation sample trajectories with a sample size of $n = 150$ , and Figure 5 shows the statistical distribution of change points, The model parameters $α = 148.2 and β = 1.03$ were obtained by fitting a Gamma distribution to the simulated samples.

Figure 4.

Simulated degradation data (150 groups).

Figure 5.

Change point distribution.

First, the remaining 149 samples except for sample 6 were used as offline data for prior parameter estimation, and the results are shown in Table 1. Figure 6 shows the degradation trend of sample 6, and Figure 7 displays the $Δ x$ incremental trajectory obtained after the first difference of sample 6. The prediction results using the Bayesian change point detection method are shown in Figure 8, with the actual change points and predicted change points being 132 and 134, respectively, leading to an error of 2.

Figure 6.

Degradation trajectory of sample 6.

Figure 7.

The degradation increment trajectory of sample 6.

Figure 8.

Prediction results of change points for sample 6 $[τ = 132 s]$ : (a) predicted mean and variance and (b) probability of change point.

To validate the superiority of the proposed change-point detection method, we conducted comparative experiments on the aforementioned 150 simulated sample groups using three different change-point detection methods: Method A (Bayesian approach), Method B (Pettitt’s non-parametric method), and Method C (CUSUM). Notably, Methods A and C were implemented with optimized parameters as specified in Table 2.

Table 2.

Change point detection results comparison.

Method	Initial parameters	[Maximum lead error, maximum lag error]Maximum error range	Improvement over B	Improvement over C
A: Bayesian	$μ_{1} = 0.81, μ_{2} = 2.31, σ_{1} = 0.42$ , $σ_{2} = 0.78, τ = 148$ .2	[0, 6] 6	78.5%	85%
B: Pettitt	/	[28, 0] 28	/	/
C: CUSUM	warmup_period = 25, delta=20, threshold = 5	[−20, 20] 40	/	/

Figure 9 presents the accuracy comparison of the three methods. In terms of prediction error ranges, the proposed Bayesian change-point detection method (Method A) demonstrated the highest precision with errors confined between 0 and −6. The Pettitt method (Method B) showed errors ranging from 0 to 28, while the CUSUM method (Method C) exhibited fluctuating errors between −20 and 20. Quantitatively, the Bayesian method achieved approximately 78.5% and 85% accuracy improvements over the Pettitt and CUSUM methods, respectively. Furthermore, Figure 9 reveals that as change-points occur later in longer time series, both Pettitt and CUSUM methods tend to produce larger prediction errors.

Figure 9.

Error analysis of three change-point detection methods.

Figure 10 analyzes the impact of change-point prediction errors on remaining useful life (RUL) estimation. The results demonstrate that: (1) RUL prediction errors increase with larger change-point detection errors, and (2) negative change point prediction errors (indicating premature detection) lead to more significant RUL estimation inaccuracies.

Figure 10.

Impact of change point errors on RUL predictions.

Combining the analyses from Figure 9, we observe that the Bayesian method consistently produces smaller lagging errors in change-point detection. Compared to the Pettitt method, this characteristic enables the Bayesian-based approach to significantly enhance RUL prediction accuracy.

Figure 10 analyzes the impact of change point prediction errors on RUL predictions. When the change in predicted change point error ranges from −12 to 12, the figure shows:

(1) The error in RUL predictions increases with the change point prediction error;

(2) When the change point prediction error is negative (predicted change points ahead of actual change points), the error in life predictions is more significant. Combining the analysis of Figure 9, it can be concluded that the Bayesian change point detection method, due to its lagging errors and relatively small error values, helps improve the accuracy of RUL predictions compared to the Pettitt and CUSUM method.

For sample 6, the degradation threshold is $ω = 200$ . According to Algorithm 2, real-time updates of parameters are performed. Substituting into the remaining life PDF formula, the RUL prediction results at each moment and the PDF distribution are shown in Figures 11 and 12, respectively. From the results, after accurately predicting the change points, the RUL predictions achieved satisfactory results.

Figure 11.

RUL results of sample 6.

Figure 12.

RUL PDF of sample 6.

Case study

The sample data comes from the bearing dataset XJTU-SY,²² which contains vibration signal data of 15 rolling bearings over their full life cycle under three different operating conditions. The test setup comprises an AC motor, a motor speed controller, a shaft, supporting bearings, a hydraulic loading system, and the test bearings. The model of the experimental bearing is LDK UER204 rolling bearing. In the test setup, the bearing vibration sampling frequency is $25.6 kHz$ , the sampling interval is 1 min, and the duration of each sample is 1.28 s.

Five sets of full-lifecycle degradation data (Bearing1_1, Bearing1_2, Bearing1_3, Bearing1_4, and Bearing1_5) were selected from the degradation database. The kurtosis index was chosen as the degradation indicator. After defining a dimensionless maximum kurtosis value of 30, the indicator was normalized to a 0–1 scale. Data from the latter two stages of the bearing degradation lifecycle (i.e. the medium-term degradation and late-term degradation stages) were retained, and the resulting degradation curves are shown in Figure 13. Next, we examined whether the degradation data conform to the two-stage linear Wiener process model (as shown in equation (3)). The method for testing whether a process fits a linear Wiener process is as follows: first, difference the degradation curve, and then, at a significance level of 0.05, verify whether the differenced signal satisfies the normality assumption. Initially, the five sets of degradation curves were assumed to follow a single linear Wiener process. The Anderson-Darling (AD) test was applied to the entire set of five curves. The AD test statistics for the five datasets were [13.8657, 5.4448, 6.9816, 17.8811, 6.9216], all exceeding the 5% critical values [0.762, 0.765, 0.765, 0.742, 0.733]. This indicates that the curves do not conform to the assumption of a single linear Wiener process. Using any of the three aforementioned methods (Bayesian, Pettitt, or CUSUM), optimal change points were detected for the five datasets. Each curve was segmented into two phases based on the detected change point. The two segments (Phase 1 and Phase 2) were then separately tested for adherence to a linear Wiener process. The change point locations and AD test results are summarized in Table 3. The results show that after segmentation, the AD test statistics for both segments of each curve are lower than the 5% critical values, meaning the null hypothesis cannot be rejected. This confirms that both segments of the degradation curves after segmentation conform to a linear Wiener process.

Figure 13.

Degradation trend of rolling bearings (bearing1_1–bearing1_5).

Table 3.

Test results for the two-stage linear Wiener process.

Bearing	Phase 1Anderson-Darling statistic	Phase 2Anderson-Darling statistic	Accept the null hypothesis: Two-stage linear Wiener process
Bearing1_1	0.3237 (<0.747)	0.5305 (<0.737)	Yes
Bearing1_2	0.4546 (<0.718)	0.3452 (<0.758)	Yes
Bearing1_3	0.4898 (<0.762)	0.5040 (<0.680)	Yes
Bearing1_4	0.5623 (<0.759)	0.1851 (<0.742)	Yes
Bearing1_5	0.6516 (<0.703)	0.4623 (<0.694)	Yes

To validate the effectiveness of the proposed method, two state-of-the-art lifespan prediction approaches were selected for comparison. Method 1, proposed in Gebraeel et al.,¹⁰ employs a Bayesian updating approach based on a Brownian motion with error process to predict the remaining useful life distribution of equipment using real-time degradation signals. Method 2, introduced in Zhang et al.,¹⁹ is a multi-stage stochastic degradation modeling approach for lifespan prediction that utilizes CUSUM-based change point detection. The proposed Bayesian change point detection method in this paper is designated as Method 3. The model error is defined as the Maximum Absolute Error (MAE), which is the sum of the absolute values of multiple lifespan prediction errors.

For the comparative experiments involving the three methods, four sets of degradation bearing data (Bearing1_2, Bearing1_3, Bearing1_4, and Bearing1_5) were selected as offline data, with Bearing1_1 used as the online bearing. Based on the monitored data up to each time point (t_k), the probability density function (PDF) of the remaining useful life (RUL) and the point estimate of the lifespan were predicted.

The degradation curve (excluding the early slow degradation data of the rolling bearing) is shown in Figure 13, and the failure threshold is set to 0.9.

In method 3, the prior parameters of the model were estimated to be: $μ_{1} = 0.003, μ_{2} = 0.034, σ_{1} = 0.01, σ_{2} = 0.05, τ = 75$ , according to the method in Section “Offline parameter estimation.” Using the offline estimated prior parameters as the hyperparameters for the Bayesian change point detection method, which are $m_{0} = 0.003, v_{0} = 10, a_{0} = 5, B_{0} = 0.0004$ , the differential curves and change point probability graphs are shown in Figure 14. Real-time change point prediction indicates that the change point location is 61. The corresponding remaining life prediction results and PDF distribution are shown in Figures 15 and 16, with fairly accurate remaining life predictions obtained at various prediction points throughout the entire life cycle.

Figure 14.

Bayesian change point detection result (bearing1_1): (a) predicted mean and variance and (b) probability of change point.

Figure 15.

RUL prediction results.

Figure 16.

RUL PDF of Method 1.

For the remaining useful life (RUL) prediction of Bearing1_1, a comparison of the results from the three methods is shown in Table 3, with visualizations provided in Figures 15 to 18. Analysis of the outcomes reveals that Method 1, based on a single-stage model, produces substantial errors in RUL prediction, with forecasts at certain time points reaching unacceptable levels. In contrast, both Method 2 and Method 3, which incorporate change point detection, generally achieve smaller prediction errors. However, the change point identified by the CUSUM-based Method 2 occurs at 68 (with an 8-point lag), resulting in overall larger prediction errors. As can be seen from Table 4, its MAE value is 907, whereas the MAE of the proposed method is 408, demonstrating the best prediction performance.

Figure 17.

RUL PDF of Method 2.

Figure 18.

RUL PDF of Method 3 (Proposed).

Table 4.

Prediction errors and comparison of the three methods.

Method	Method 1	Method 2	Method 3 (Proposed)
MAE	3625	907	408

Conclusion

In response to the multi-stage characteristics of random change points in the operational degradation process of rolling bearings, this paper proposes a novel multi-stage RUL prediction method based on Bayesian change point detection. Through analysis and experimental validation, the following conclusions are drawn:

(1) The proposed life prediction method employs Bayesian change point detection, which improves the accuracy of change point detection by 85% compared to other methods, thereby enabling high-precision multi-stage RUL prediction.

(2) his method is applicable throughout the entire lifecycle of bearing degradation, including both before and after the occurrence of change points. Particularly in the first stage (before the change point), it leverages prior estimates of change points and second-stage parameters derived from offline data to achieve relatively accurate RUL predictions.

(3) The proposed method adopts online inference for prediction, where change points are detected by generating the precise distribution of the next unobserved data point given the observed sequence. This approach belongs to real-time online change point detection and, when combined with real-time parameter updating, can be applied to online RUL prediction.

Footnotes

Handling Editor: Chenhui Liang

ORCID iD

Wenping Lei

Author contributions

Wenping Lei., Conceptualization, Investigation, and Writing—original draft preparation; Wangshen Hao., Investigation, Software and Formal analysis; Chenyang Li, Writing—original draft preparation, Software, Supervision; Yifei Zhang, Software, Visualization, Formal analysis, Visualization. All authors have read and agreed to the published version of the manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is partially supported by the National Natural Science Foundation of China (51775515).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The XJTU-SY dataset is available at: .

References

Chen

. Study on algorithm for rolling bearing remaining useful life prediction and development of monitoring software [D]. Harbin Institute of Technology, China, 2021.

Liu

Wang

, et al. Fault feature extraction method of rolling bearings based on FIF-CYCBD. J Zhengzhou Univ (Eng Sci) 2022; 43(4): 36–40.

Wen

Yuan

Multiple-phase modeling of degradation signal for condition monitoring and remaining useful life prediction. IEEE Trans Reliab 2017; 66(3): 924–938.

Wang

Zhang

Early defect identification: application of statistical process control methods. J Qual Maint Eng 2008; 14(3): 225–236.

Data-driven remaining useful life prediction theory and applications for equipment. National Defense Industry Press, 2016, pp.1–19.

Tseng

Tang

Determination of burn-in parameters and residual life for highly reliable products. Nav Res Logist 2003; 50(1): 1–14.

Peng

Tseng

Mis-specification analysis of linear degradation models. IIE Trans 2004; 36(12): 1161–1170.

Lee

Tang

A modified EM-algorithm for estimating the parameter of inverse Gaussian distribution based on time-censored Wiener degradation data. Stat Sin 2007; 17(3): 873.

Joseph

IT.

Reliability improvement experiments with degradation data. IEEE Trans Reliab 2006; 55(1): 149–157.

10.

Gebraeel

Lawley

, et al. Residual-life distributions from component degradation signals: a Bayesian approach. IIE Trans 2005; 37(6): 543–557.

11.

Shi

Xue

Degradation data driven online prediction for equipment residual life. Comput Eng Appl 2016; 52(23): 249–254.

12.

Wang

Peng

, et al. A two-stage data-driven-based prognostic approach for bearing degradation problem. IEEE Trans Industr Inform 2016; 12(3): 924–932.

13.

Huang

Prediction method of remaining useful life for rolling bearing based on multi-degradation stage assessment. Central South University, 2023.

14.

TS.

An application of the EM algorithm to degradation modeling. IEEE Trans Reliab 2008; 57(1): 2–13.

15.

Wang

Tang

Joo Bae

, et al. Bayesian analysis of two-phase degradation data based on change-point Wiener process. Reliab Eng Syst Saf 2018; 170: 244–256.

16.

Dong

Zheng

, et al. Remaining useful life prognostic method based on two-stage adaptive Wiener process. Acta Autom Sin 2022; 48(2): 539–553.

17.

Xie

Xiong

Exploring the ability of the Pettitt method for detecting change point by Monte Carlo simulation. Stoch Environ Res Risk Assess 2014; 28: 1643–1655.

18.

Adams

Mackay

DJC

. Bayesian online change point detection. arXiv preprint arXiv:0710.3742, 2007.

19.

Zhang

Gao

, et al. A residual useful life prediction approach for equipments with multi-state stochastic degradation. J Syst Eng 2017; 32(1): 1–7.

20.

Zhang

, et al. A novel lifetime estimation method for two-phase degrading systems. IEEE Trans Reliab 2019; 68(2): 689–709.

21.

Fan

Wang

Residual life and optimal maintenance decision for a piece of equipment. National Defense Industry Press, 2018, pp.20–30.

22.

Lei

Han

Wang

, et al. XJTU-SY rolling element bearing accelerated life test datasets: a tutorial. J Mech Eng 2019; 55(16): 1–6.

A Bayesian change point detection-based method for rolling bearing remaining useful life prediction

Abstract

Keywords

Introduction

Problem description

Two-stage degradation process model

Two-stage Wiener linear model

The RUL distribution of the two-stage degradation process

Offline parameter estimation

The PDF of degradation increments

Bayesian change point detection

Change point detection method

Conjugate exponential distribution

Bayesian change point detection algorithm

Bayesian change point detection based RUL prediction

RUL prediction flowchart

Online parameter update

RUL prediction algorithm

Simulation validation

Case study

Conclusion

Footnotes

ORCID iD

Author contributions

Funding

Declaration of conflicting interests

Data availability statement

References