RSF-based model for predicting pump failure trends in tunnels

Abstract

The water pump is a piece of crucial electromechanical equipment to ensure the safety of tunnels. Therefore, it’s essential to master the performance trend of pumps to prevent the occurrence of failure. In this paper, essential information and failure records of pumps in 15 operating tunnels for many years were collected. According to the data characteristics, a data-filling model based on XGBoost is developed to address the issue of the censored data. Considering that most pumps are still in operation, a failure prediction model based on Random Survival Forest (RSF) is designed by incorporating survival analysis principles. The proposed Pump Failure Trend Prediction Model (PFTPM) overcomes difficulties caused by the lack of previous data and the small number of old pumps. We identify two phases of failure: the first phase exhibits a bathtub-shaped failure rate curve, while the second phase is characterized by a lower failure risk. The importance of considering rainfall, pump operating time, and performance changes for effective maintenance planning is emphasized. Furthermore, we summarize the failure evolution law of various types of pumps to amend maintenance cycle in the existing specification. Overall, this paper integrates innovative big-data technologies into the traditional maintenance data of tunnel pumps.

Keywords

Tunnel water pump failure prediction random survival forest

Introduction

The pump plays an important role in the safe operation of the tunnel. Because of the low terrain, once the water pump fails, the tunnel can cause serious accidents such as ponding or flooding during the rainy season or rainstorms, which seriously affects the tunnel’s safety. For example, in April 2013, a drainage pump in an underground tunnel in Zhuhai failed, causing water to accumulate in the tunnel, forcing cars to stall, and causing traffic accidents. Therefore, it has become a hot research topic to study the law of pump failure and reasonably arrange the pump maintenance and repair plan.^1–3

The failure mechanism method establishes a physical model from the perspective of the causes of equipment failure to reflect the performance change of the equipment. Orsagh et al.⁴ combined the Totaro model and the crack propagation model to establish a full-lifecycle failure prediction model for bearing components. On the basis of studying the displacement efficiency and wear degradation of the axial piston pump, Guo et al.⁵ put forward a life prediction model based on the Weibull distribution. Failure mechanism models have high prediction accuracy, but the methods are highly targeted and hard to extend to other devices.

Data-driven approaches are mainly implemented by artificial intelligence (AI) methods, such as neural networks and support vector machines (SVM), which are widely used in fault diagnosis⁶ and condition-based maintenance. The former is broadly used for utilizing data from multiple sensors to achieve multiple fault warnings,^7,8 while the latter is more conducive to preventive maintenance decisions in advance. Cui et al.⁹ applied the autoregressive integrated moving average model (ARIMA) and long-short-term memory recurrent neural networks (LSTM) to create a prediction method that has a better effect on the performance trend prediction of aircraft hydraulic pumps. Dai et al.¹⁰ use fuel pump data to forecast the degradation trend of airborne fuel pumps based on the singular value trend decomposition, using an algorithm combining the least squares support vector machine (LSSVM) and particle swarm optimization (PSO). Kimera and Nangolo¹¹ use the SVM to predict terminal pump failure trends, pointing out that the quality and quantity of historical data sets significantly impact forecast accuracy.

Data-driven approaches show unique advantages, but they are highly dependent on data integrity. On the one hand, AI algorithms can be applied more widely and quickly without a physical model as long as collecting data.¹² On the other hand, their performance depends on the data. The absence of data limits the effectiveness of feature extraction,¹³ resulting in poor model training and ultimately affecting the accuracy of prediction results.

However, due to the management level of the tunnel operation and maintenance enterprise, the data of many pumps are not available at the initial stage of operation, and maintenance data is seriously missing. In addition, most of the tunnel life is not long, many pumps have not yet reached the scrapping stage, and there is little data on pump failures for more than 10 years. Therefore, the tunnel water pump’s historical failure data has both left and right censoring, bringing substantial obstacles to failure trend prediction.^14,15 If censored data is not considered, the predicted results will have a notable deviation.¹⁶ Nevertheless, the existing research does not consider the censored characteristics of fault data collected in practice, so how to fully overcome the problem of censored data and design the performance prediction model of the water pump is the focus and difficulty of the research.

The processing of censored data is divided into two types. One is the data-filling method, such as substitution, standard statistical routines, various filling methods, and parameter estimation.^17–19 Alrumayh et al.²⁰ used a Bayesian estimator to estimate the distribution and the parameters of the reliability function, which was verified on multiple datasets with censored samples. When there are few missing values, these kinds of operations are simple and popular, but in the case of numerous unknown data, they may lead to low accuracy of prediction.²¹ Regardless, there are problems such as significant filling errors, long consumption time, and challenging selection of data distribution. The other one is reducing the impact of censored data on model performance through model design. For example, the survival analysis method has certain advantages in dealing with right-censored data. Hsieh and Chen²² proposed a two-stage redistribution algorithm to estimate the survival function of failure time under the current state with independent censored data. Mukherjee et al.²³ improved the Weibull survival model to solve the problem of right-censored data, thus improving the robustness of research results. In addition to the Weibull distribution model, traditional survival analysis methods such as Kaplan-Meier estimation and Cox proportional hazard model can also meet the basic needs of censored data prediction.^24–27 However, their advantages are not apparent in processing high-dimensional data which has a heavy censoring rate.^28,29 With the development of machine learning, because of their minor initial restrictions, this kind of method has gradually been applied to censored data filling and prediction. Compared with previous methods, the Random Survival Forest (RSF) method has several unique advantages. Firstly, it does not involve the traditional survival analysis method, thus its performance is much better on high-dimensional data,³⁰ avoiding the problem of over-fitting. Secondly, the RSF is based on the voting results of most trees, which is insensitive to outliers and has high robustness.^31,32 Finally, it does not require complex parameter optimization and shows excellent interpretability.³³ Thus, RSF can meet the requirements of pump data analysis.

This paper combines the idea of survival analysis and machine learning. It establishes the failure prediction of water pumps to overcome the shortcomings of traditional methods in censored data processing and prediction and improve prediction accuracy.

The rest of this article is organized as follows. Section 2 describes the methodology and procedures used to conduct the failure trend prediction model of the tunnel water pump. Section 3 analyzes the characteristics of data collected by water pumps. Section 4 incorporates the tunnel water pump data into the proposed failure trend prediction model and compares other models to prove the effectiveness of the prediction models. Section 5 applies the model to numerical studies and some suggestions are given for the maintenance and management of pumps in the tunnels. Finally, the conclusion of this paper and future work is in Section 6.

Methodology

Feature extraction based on Spearman correlation and VIMP

Feature extraction is essential for constructing relevant features from the original collected data, thereby improving model efficiency. This study used Spearman correlation analysis and Variable Importance (VIMP) to capture the relationships among features and between features and the target variable.

When analyzing feature relationships, Spearman correlation analysis is a non-parametric measure that assesses the multicollinearity between variables, making it suitable for analyzing both continuous and categorical variables simultaneously.

VIMP is widely used in various machine learning techniques to evaluate the contribution of variables to the prediction of the target variable. It assesses the extent to which the model’s predictive accuracy changes when variables are randomly removed or rearranged. For a variable X, out-of-bag (OOB) samples are put into the in-bag survival tree, and a daughter node is randomly assigned to it whenever the split for X is encountered. After that, each tree’s cumulative hazard function (CHF) is calculated and averaged. The VIMP for X is equal to the original ensembles’ prediction error subtracted from the new ensembles’ prediction error obtained by randomizing X assignments.

Data filling model based on XGBoost

Due to the low level of early information management, some data is missing previous fault records. Effectively filling in the missing data is beneficial for ensuring the accuracy of fault trend prediction. However, determining the data distribution is challenging. Machine learning techniques, which are not limited by data distribution assumptions, are increasingly being used for filling the left-censored data. Extreme Gradient Boosting (XGBoost) is a tree-based integrated learning method,³⁴ which can realize regression and classification and is widely used in device fault diagnosis and prediction.^35–37 It has powerful advantages, such as fewer hyperparameters,³⁸ higher training performance of imbalanced data sets,³⁶ and fast calculation speed.³⁷ Wang et al.³⁹ applied XGBoost to predict the fault of the fan bearing. Zhang et al.⁴⁰ combined random forest and XGBoost for failure detection of offshore fans, and the results showed that the model was robust to fans under different working conditions. Therefore, we transform the early data deficiency into a dichotomic problem of fault diagnosis and use the XGBoost algorithm as a prediction model to fill the censored data.

Failure trend prediction model based on RSF

Combined with the results of key feature extraction and the calculation principle of the random survival forest algorithm (Figure 1), the modeling process of failure trend prediction based on RSF is as follows.

Figure 1.

Random survival forests algorithm diagram.

The calculation principle of RSF algorithm

First, the original data set is subjected to bootstrapping, where a random sample of B bootstrap samples is drawn with replacement. On average, each bootstrap sample excludes 37% of the original data as OOB data, which is used as the testing sample for the RSF model. Secondly, for each bootstrap sample, a binary recursive survival tree is constructed. At each tree node, a random selection of P variables is considered for splitting, and the variable that maximizes the survival difference among the child nodes is chosen. Thirdly, the trees are allowed to grow to their maximum extent until the sample size in each terminal node is equal to or greater than the specified minimum default value. Fourthly, the cumulative hazard function (CHF) is calculated for each tree, and the average of the CHFs from all trees is considered as the CHF for the overall forest. Finally, the prediction error of the overall CHF is computed using the OOB data.

There are four parameters that need to be optimized in the RSF model. First, the parameter of $n_estimators$ represents the number of decision trees in the forest. Secondly, when the decision tree is constructed, each node has a subset of features, so the parameter of $\max_features$ means the maximum number of features in the subset. Thirdly, $\max_depth$ indicates the maximum depth of the decision tree. The last parameter of $\min_samples_leaf$ is the smallest sample size on the leaf nodes and determines the noise captured by the model in train data. The parameters were adjusted by a random grid search method to obtain the optimal parameter combination.

Cumulative hazard function and failure rate $λ^{*} (t | X_{i})$

Assume that the pump data is $(T_{i, h}, δ_{i, h})$ of the end node $h$ of the tree to represent their failure time and censoring information. There are $n (h)$ pumps on the end node $h$ of the tree, and the data are $(T_{1, h}, δ_{1, h}), \dots, (T_{n (h), h}, δ_{n (h), h})$ respectively. $(T_{i, h}, δ_{i, h})$ is arranged in order of $T_{i, h}$ from small to large in order to obtain $(T_{l, h}, δ_{l, h})$ .

It is assumed that $d_{l, h}$ represents the number of failures at time $t_{l, h}$ , and $Y_{l, h}$ represents the number of samples without failures and right censoring at time $t_{l, h}$ , then the cumulative hazard function (CHF) of the end node $h$ is defined in formula (1).

{\hat{H}}_{h} (t) = \sum_{t_{l, h} \leq t} \frac{d_{l, h}}{Y_{l, h}}

(1)

Since the pump $i$ has multidimensional variables $X_{i}$ and the survival tree is a binary tree, $X_{i}$ is bound to fall into a unique endpoint. Therefore, the CHF of each pump is equal to the CHF of its endpoint. Define $H (t | x_{i})$ as the CHF of a water pump $i$ , and its calculation method is shown in formula (2).

H (t | x_{i}) = {\hat{H}}_{h} (t), i f x_{i} \in h

(2)

Formula (2) is the CHF from a tree. To calculate the overall CHF, it is necessary to average $B$ trees and apply the CHF of all samples as shown in formula (3). Then the failure rate of the water pump $i$ is calculated, as shown in formula (4), where $H_{b}^{*} (t | x_{i})$ represents the CHF of the water pump $i$ in the survival tree $b$ .

H_{e}^{*} (t | x_{i}) = \frac{1}{B} \sum_{b = 1}^{B} H_{b}^{*} (t | x_{i})

(3)

λ^{*} (t | x_{i}) = \frac{H^{*} (t | x_{i}) - H^{*} (t - 1 | x_{i})}{1 - H^{*} (t - 1 | x_{i})}

(4)

Pump Failure Trend Prediction Model

The model framework is shown in Figure 2 to provide a visual representation of the key steps and components involved in our proposed Pump Failure Trend Prediction Model (PFTPM). PFTPM includes five steps: data preprocessing, creation of the feature extraction model based on VIMP, creation of the data filling model based on XGBoost, creation of the failure trend prediction model based on RSF, and pump failure trend analysis as well as management insights.

Figure 2.

Failure trend prediction process.

Data collection and analysis

Data collection

The pump is a piece of complex mechanical equipment with various types and characteristics under different working conditions. Service performance degradation is often the result of a combination of factors. For example, temperature, equipment operating parameters, load, surrounding working environment, and other characteristics are important factors affecting the service performance of water pumps. According to the operating mechanism of pumps, inappropriate operating parameters and loads can increase the pressure on hydraulic components, while adverse working environments can lead to pump clogging. These factors accelerate the aging and failure of pumps to varying degrees. Therefore, in modeling equipment failure trends, it is often necessary to consider the influence of multiple factors.

In the data collection phase, this study particularly focuses on ensuring the adequacy of variables. Due to technical limitations, certain factors such as temperature and vibration data are unavailable due to the lack of sensor devices and data storage equipment in traditional maintenance modes. Through communication with experts in tunnel equipment operation, we ensure that the collected variables adequately cover the pump failure characteristics and that the data is obtainable, ensuring the effectiveness of subsequent analysis and research. In this paper, the basic information of 602 pumps from 15 tunnels in service in Shanghai and Hangzhou and the available historical fault data are collected, including primary equipment data, fault maintenance feature data, and climatic conditions feature data. The relevant parameter variables are shown in Table 1. A pump data collection form has been designed, listing all variables and their definitions as presented in Table 1. The form was provided to the tunnel operating company for completion. In addition, a thorough investigation of the equipment’s historical records was conducted to ensure the integrity and accuracy of the provided data.

Table 1.

Data feature parameter variables.

No.	Feature category	Feature symbols	Explanation
1	Basic Feature ( $X_{basic}$ )	$Tunnel$	The tunnel where the pump serves.
2		$Type$	Types of water pumps, including rainwater pumps, central pumps, wastewater pumps and lifting pumps.
3		$Manu$	Water pump manufacturer.
4		$t_{age}$	The service age of the pump is accumulated year by year from the time the pump is activated to the end of the observation time.
5	Parameter Feature ( $X_{perform}$ )	$Flow$	Operating capacity indicators of pumps.
6		$Lift$
7		$Power$
8	Climatic Conditions Feature ( $X_{weather}$ )	$Prec$	Annual cumulative precipitation from China Meteorological Administration.
9		$t_{minimum}$	Annual minimum and maximum temperatures from China Meteorological Administration.
10		$t_{maximum}$
11	Fault Maintenance Feature ( $X_{fault}$ )	$Fail$	The total number of failures in the past.
12	Fault Maintenance Feature ( $X_{fault}$ )	$T$	Failure time refers to the time point when the fault is observed.

Feature analysis

Comparing the actual service life of each pump with the years of data recording (Figure 3), we found that only two of the pumps in the 15 tunnels started recording failure data when they were put into service. Many of the pump failure records were incomplete. Thus, pump failure data is generally characterized by missing left-hand data, so filling in missing data values is essential. In addition, since the deadline for observing pump failure data in this paper is December 31, 2021, the vast majority of the pumps are still in use and have not reached the end-of-life stage. Therefore, the pump data has left and right censoring characteristics.

Figure 3.

Pump data characteristics analysis.

Table 2 presents the basic statistical results of the continuous variables. In the fundamental parameters of pumps, it can be observed that the lift and power are significantly lower than the mean and standard deviation of the flow, indicating a larger variability in flow compared to the relatively smaller variability in lift and power. Regarding the climatic condition factors, the annual minimum and maximum temperatures exhibit relatively higher stability in comparison to the annual accumulated precipitation.

Table 2.

Basic statistical results of the continuous variables.

Continuous variables	Mean	SD	Min	Max	Q1	Q3
$Flow$	43.32	55.28	2.00	410.00	12.00	45.50
$Lift$	23.61	12.82	4.50	55.00	14.00	30.00
$Power$	14.01	14.53	0.45	90.00	1.50	22.00
$Prec$	1428.11	120.09	1243.51	1661.01	1345.36	1467.16
$t_{minimum}$	−5.16	0.77	−6.70	−3.70	−5.55	−5.14
$t_{maximum}$	38.16	0.32	37.60	38.73	38.10	38.25

Case study

Data preprocessing

According to different data types, we take different processing methods. For example, continuous variables $x$ such as total precipitation, flow, lift, power, and so on are normalized according to the formula (5). Discrete variables such as pump type, tunnel, and manufacturer are sequentially coded. For example, there are four types of pumps, including rainwater pumps, wastewater pumps, central pumps, and lifting pumps, which are coded in a sequence from 0 to 3.

x^{*} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(5)

Feature extraction

In order to eliminate redundant features and preserve those features that are relevant to the predicted target, the study explores the correlations among features and between features and the target variable. First of all, to address the issue of multicollinearity among features, a correlation analysis is conducted to capture the relationships among features. Given the presence of both continuous and categorical variables in the collected features, Spearman correlation analysis is employed in this study, and the results are depicted in Figure 4. The heatmap reveals a strong correlation of 0.83 between the variables Lift and Power, indicating the need to remove one of them to enhance the stability of the constructed model.

Figure 4.

Spearman correlation analysis.

Next, the paper further determines which variable to delete by prioritizing those that exhibit significant correlations through VIMP.³⁰ VIMP is a feature extraction method in the random survival forest. A variable’s VIMP value is a high positive value, indicating that the variable dramatically impacts the pump failure trend. Otherwise, it has a weak impact on the pump failure trend.

Based on the 12 types of feature data collected in Section 2.1, nine key features, including tunnel in service, pump type, manufacturer, flow, lift, power, service age, total precipitation, and failure time, are extracted by the VIMP method. The rank and importance of each feature are shown in Table 3. It can be observed that the VIMP value of the variable Power is greater than the variable Lift, so Lift is removed to avoid multicollinearity between variables. The VIMP values of the remaining eight key features are all >0, which indicates that they have a specific impact on the pump failure trend. The pump’s service life ranks first, followed by the total precipitation, demonstrating that the service life of the pump and the local rainfall are highly correlated with the failure trend of the pump.

Table 3.

VIMP analysis.

Key feature	VIMP values	Rank
$t_{age}$	0.3344 ± 0.0516	1
$Prec$	0.0284 ± 0.0074	2
$Power$	0.0258 ± 0.0069	3
$Tunnel$	0.0071 ± 0.0050	4
$Fail$	0.0034 ± 0.0018	5
$Lift$	0.0023 ± 0.0030	6
$Manu$	0.0011 ± 0.0029	7
$Flow$	0.0006 ± 0.0039	8
$Type$	0.0001 ± 0.0007	9

Data filling

Considering that pump failure is closely related to its characteristics and usage, pump type, flow, power, total precipitation, service age, previous failure times, manufacturer, and service tunnel are input variables. The failure status (whether the water pump has failed) in a specific year is an output variable. The left-censored data-filling model is established by formula (6):

\begin{matrix} {\hat{Ω}}_{XGBoost} = f_{XGBoost} \\ (Type, Flow, Power, Prec, t_{age}, Fail, Manu, Tunnel) \end{matrix}

(6)

Where, ${\hat{Ω}}_{XGBoost}$ represents the failure discrimination result of pump left censored data predicted by XGBoost, and $f_{XGBoost}$ means the classification model based on XGBoost.

In this paper, the water pump failure data of seven tunnels, such as the Beidi Road tunnel and Jiaohuan tunnel, were selected as training sets. Failure data has been recorded for these pumps since the tunnels started in service. In addition, the model replenishes the missing data on early pump failures, which is applied to fill the left-censored fault data of the remaining 602 pumps of eight tunnels.

Failure trend prediction model

The input data is the characteristic information $X_{i}$ of the water pump $i$ , where $X_{i} = Type, Flow, Power, Prec, t_{age}, Fail, Manu, and Tunnel$ . The output data is the TRUE-FALSE censoring information $δ$ for individuals of different survival ages and failure time $T$ . Taking the service life of the tunnel water pump as the separation, the operation data of a pump is divided into several parts, and the single data record is expressed as each characteristic performance of the water pump in a certain service age of the pump. Assume that the cut-off time for observation is $t_{ob}$ , the current service age is $t_{age}$ , and the service age at the time of failure is $t_{age}^{f 1}, t_{age}^{f 2}, \dots, t_{age}^{fi}$ , where $i$ represents the total number of failures occurring up to $t_{age}$ . The calculation method of failure time $T$ and censoring information $δ$ is shown as follows (Figure 5), which can be divided into two scenarios:

If $t_{ob} \geq t_{age}$ , for a pump that has experienced a failure at least, $T = {maxt}_{age}^{f 1}, t_{age}^{f 2}, \dots, t_{age}^{fi}$ , $δ = TRUE$ , for a pump that has not failed, $T = t_{age}$ , $δ = FALSE$ .

If $t_{ob} < t_{age}$ , the operating status of the pump at this time is not within the range of observation, and the latest failure time cannot be determined. Therefore, regardless of whether the pump has failed before, its failure time is equal to the current service age $t_{age}$ , $δ = FALSE$ .

Figure 5.

Calculation method of failure time and examination results.

After converting the data of 602 water pumps into input data based on their service age, the dataset consists of 2087 samples. Subsequently, these samples are allocated according to a specified proportion. The training and validation sets are assigned 70% of the total samples and are utilized for model training. The validation set is further divided during the process of k-fold cross-validation, serving the purpose of optimizing model parameters and mitigating overfitting. The remaining 30% of the samples constitute the test set, which is employed to evaluate the predictive performance of the model. Consequently, the data distribution yields a training set with a sample capacity of 1169, a validation set with a sample capacity of 292, and a test set with a sample capacity of 626.

The random grid search method is employed to optimize parameters and perform cross-validation within a specified parameter space, ultimately resulting in the identification of the optimal parameter configuration, as shown in Table 4.

Table 4.

The value of the optimal model parameter.

Model parameters	Parameter values
n_estimators	100
max_features	6
max_depth	5
min_samples_leaf	3

Model performance evaluation

We evaluate the prediction effects of the data-filling model and failure prediction model, respectively. The former uses three common evaluation indicators, while the latter compares the predicted concordance index (C-index) with the traditional survival analysis model.

Data filling model

To analyze the validity of data filling, this paper selects $Precision$ (formula 7), $Recall$ (formula 8), and $F 1$ (formula 9) as indicators to evaluate.

Precision = \frac{TP}{TP + FP}

(7)

Recall = \frac{TP}{TP + FN}

(8)

F 1 = \frac{2 \cdot Recall \cdot Precision}{Recall + Precision}

(9)

Where $TP$ is the number of samples correctly identified as failures, $FP$ is the number of samples incorrectly identified as faults, and $FN$ is the number of samples incorrectly identified as nonfailures.

The evaluation results of the model are shown in Table 5. $Precision$ , $Recall$ , and $F 1$ are all no <0.88, confirming the validity of the data-filling model.

Table 5.

Model evaluation results.

$Precision$	$Recall$	$F 1$
0.880	0.883	0.881

Failure prediction model

The pump data set is trained by the Cox proportional hazards (CPH) model and the random survival forest (RSF) model, respectively. The advantage of the CPH model is that it does not need to make any specific probability distribution assumption for baseline survival time, so it has become one of the statistical methods widely used in survival analysis.⁴¹ It is commonly used in research to quantify the degree of influence of covariable on equipment failure and the probability of survival event occurrence at each time point.⁴² Harrell consistency index (C-index)³⁰ is used to compare the prediction errors of the two models, and the probability that the predicted results are consistent with the actual results is estimated. The C-index is calculated as follows:

First, all samples are paired to generate all possible data pairs in the data.

Secondly, if the failure time of pump A in the data pair is earlier than that of pump B, but the data of pump A is censored, delete the data pair. If the failure time of the data pair is the same and the data pair is censored, delete the data pair, too. The remaining number of data pairs is recorded as $P$ .

Thirdly, for each data pair: we suppose that the failure time of pump A is $T_{A}$ , and the failure time of pump B is $T_{B}$ . When $T_{A} \neq T_{B}$ , if the predicted failure time with short failure time is earlier, it is recorded as 1. If the predicted results are the same, it is recorded as 0.5. When $T_{A} = T_{B}$ , and both samples fail. If the predicted results are the same, it is recorded as 1, otherwise, it is recorded as 0.5. When $T_{A} = T_{B}$ , but only one sample fails, the earlier predicted failure time is recorded as 1. Otherwise, it is 0.5. Sum the above results as $C_{p}$ .

Fourthly, calculate the C-index, as shown in formula (10).

C = \frac{C_{p}}{P}

(10)

As shown in Table 6, the results prove that the prediction accuracy of the XGBoost-RSF model is higher than that of traditional survival analysis methods, which confirms that the model proposed in the paper is effective.

Table 6.

The C-index of RSF and CPH.

Model	C-index
RSF	0.921
CPH	0.912

Discussion

Results analysis

A data-driven model based on censored data interpolation is established to predict the failure trend of pumps in the tunnel. Some insights are provided for formulating pumps’ maintenance strategy. The general failure rate curve and cumulative hazard curve of pumps are shown in Figure 6(a) and (b). The general failure law of the water pumps is divided into two stages: the first 10 years is the first stage, which presents characteristics of the bathtub curve. The failure rate exceeds 10% in the second year, and the following 6 years are at a low level of failure risk, which means that the first 2 years are in the run-in period when pumps are put into use. It reaches the peak failure rate in the 10th year, meaning pumps are in the wear and tear failure period, which is not conducive to maintaining stable tunnel operation. Then it comes to the second stage in the 11th to 18th years, which presents a yearly decreasing trend. This is because in the actual engineering situation, the pumps are usually dismantled and parts replaced in the 11th year, which dramatically enhances the performance of the pumps.

Figure 6.

Water pumps’ fault trends. (a) General failure rate curve and (b) cumulative hazard curve.

Figure 7 shows the failure curves of various pumps with the age of service. Figures 8 and 9 illustrate the failure rates of four kinds of them within the first 10 years and after the 11th year, respectively. From these three graphs, there is a certain pattern of differences between the different sorts of pumps.

Figure 7.

Failure curves of various pumps.

Figure 8.

Failure trends of four kinds of water pumps from 1 to 10 years.

Figure 9.

Failure trends of four kinds of water pumps from 11 to 18 years.

Wastewater pumps are the category of pumps with higher failure rates in the first 2 years and the seventh to ninth years. In the second stage, they are at a bit lower failure rate. In contrast, the central pumps are more prone to failure in the second stage of service periods, as evidenced by the higher failure rate of this pump category in the 13th to 18th years. The lift pumps have a lower peak failure rate than any other class of pumps, for example, the failure rate in the 10th year is only 19.42%, while the next lowest failure rate is 32.02%. Not only that, but it also did not show a significant increase in the second year, indicating that it has a shorter break-in period. So, the lift pumps can be put into service quickly and can maintain better operational performance during the same service period.

The figures also reflect that the failure rate curves of the rainwater pumps and the wastewater pumps follow a similar trend. However, the peak of the rainwater pumps in the 10th year is the highest. Combined with the characteristic VIMP it can be found that the service life and precipitation of the rainwater pumps are strongly correlated with the failure rate. In years with high precipitation, the rainwater pumps will have a higher probability of failure due to excessive workload.

In addition, basic pump information, failure records, and environmental and meteorological information are incorporated into the model input. The verified model helps managers to better understand the relationship between pump failure and its factors and causes (type, climate, service age, etc.) so that they can take action in advance. It is worth reminding managers that they should adapt the characteristics of the input to apply in similar situations.

Finally, our model has obtained the failure trend of the overall water pumps and the change in the failure rate of various types of pumps. By analyzing the results, the paper has provided the corresponding scientific maintenance management insights that can help the pumps in the tunnel to maintain good performance in the long-term operation process. We also have created a platform for research applications that can be referenced. This visualization pathway can greatly improve management efficiency. Therefore, the practical value of this study lies in providing opportunities for preventive maintenance of water pumps, minimizing the negative impact of water pump failure on the running state of the tunnel, reducing the time of road closure maintenance operations as much as possible, and ensuring that the tunnels, as a key infrastructure in the field of transportation, will generate lasting economic, social and security benefits.

Management insights

Basic pump information, failure records, and environmental and meteorological information are incorporated into the model input. The verified model helps managers to better understand the relationship between pump failure and its factors and causes (type, climate, service age, etc.) so that they can take action in advance. Firstly, managers should plan for preventive maintenance as early as possible before peak failure periods. In practice, pumping equipment is returned to the factory in batches for maintenance. The results of this paper can be used as a source of maintenance decision support for tunnel managers and can help them decide on the sequence of maintenance to avoid potential pump failures. Secondly, this paper suggests that there is a close relationship between precipitation and rainwater pump failure events. Hence managers should carry out pump inspections in advance in accordance with weather forecasts to eliminate pump failures that cause waterlogging during periods of heavy precipitation. Thirdly, Maintenance work of water pumps should consider the characteristics and trends of pump performance changes, rather than determine a fixed cycle.⁴³ The maintenance plan adopted in practical projects focuses on the cycle from the fourth to the sixth year. Although this is helpful in decreasing the failure rate of the pumps, it can be seen from the calculation results that the maintenance time is not in the critical years before the equipment is prone to failure. Hence, its effect on reducing the failure rate is limited.

Following the proposed research model in this paper, we designed a digital display interface (Figure 10) and a decision aid interface (Figure 11) for the visualization of pump failure records. The number of pumps and pump types is visualized in the data center through a pie chart. Non-continuous historical pump failure information can also be clearly viewed here. In the decision aid section, managers can schedule preventive maintenance in advance with the help of future failure trends of overall and individual pumps based on the RSF prediction model. Because each pump has a unique number, this maintenance advice is more targeted and actionable.

Figure 10.

Water pumps data center interface.

Figure 11.

Water pumps decision aid interface.

Conclusions

This paper establishes a failure trend prediction model to master the malfunction law of pumps in tunnels. Our work collected and sorted out the basic information, fault records, and environmental and meteorological information of pumps, among which the fault data was manually recorded and found to have serious censored characteristics. In this paper, a failure prediction model based on XGBoost in the early stage and discontinuous failure period is established, which solves the left-censored problems of water pump data in some old tunnels and provides digital support for subsequent failure trend prediction. On the other hand, considering that most of the tunnel equipment is still in service, our model draws lessons from the idea of survival analysis and designs the failure trend prediction model based on RSF. The experimental results show that the accuracy of the XGBoost method reaches 88%, and the RSF fault trend model has better performance than the CPH model. The proposed model can solve the fault prediction in the whole service life of the equipment. Finally, according to the prediction results of the pump fault trend, the paper summarizes the fault rule and puts forward the reference maintenance opinion. Overall, the study is based on real data obtained from devices under traditional operational modes, rather than virtual data. This approach is easy to implement and helps to provide tunnel equipment managers with productive preventive maintenance decision support to avoid potential pump failures. Importantly, the model proposed in this paper is built upon common equipment baseline information and failure data characteristics, effectively addressing the challenges associated with leveraging these data to establish accurate predictive fault models. Consequently, the model exhibits valuable generality, thereby offering extensive application potential for other devices.

However, there are still areas worthy of further improvement. First, the model can be improved in the ability of failure pre-warning by exploring pump failure tendencies using current and voltage signals. Failure pre-warnings can be issued according to the probability of occurrence by analyzing and predicting changes in data eigenvalues. Secondly, we can improve the accuracy of fault diagnosis and make accurate fault Repair Strategies according to analyzing the causes of equipment fault by causal inference. Finally, it would be promising to modify the framework of our proposed model by introducing an integrated decision-making approach and combining the results of fault diagnosis and failure trend prediction to provide tunnel equipment managers with more target-specific preventive maintenance advice, including early maintenance, replacement, or preparation of spare parts in advance.

Footnotes

Handling Editor: Chenhui Liang

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Shanghai Municipal Transportation Committee [grant numbers JT2021-KY-013].

ORCID iD

Xin Wu

Data availability

The data that support the findings of this study are available from Shanghai University and Shanghai Urban Construction (Group) Corporation Research Center for Building Industrialization, upon reasonable request.

References

Feng

Zhang

, et al. Numerical investigation on characteristics of transient process in centrifugal pumps during power failure. Renew Energy 2021; 170: 267–276.

Zheng

Kan

, et al. Numerical simulation and experimental study of transient characteristics in an axial flow pump during start-up. Renew Energy 2020; 146: 1879–1887.

Mohammed

Data driven-based model for predicting pump failures in the oil and gas industry. Eng Fail Anal 2023; 145: 107019.

Orsagh

Sheldon

Klenke

. Prognostics/diagnostics for gas turbine engine bearings. In: IEEE aerospace conference proceedings, Montana, USA, 8–15 March 2003, pp.3095–3103. New York: IEEE.

Guo

Zhou

Zhang

. Reliability evaluation of axial piston pump based on degradation failure. In: International conference on sensing, diagnostics, prognostics, and control (SDPC), Shanghai, China, 16–18 August 2017, pp.204–209. New York: IEEE.

AlShorman

Alkahatni

Masadeh

, et al. Sounds and acoustic emission-based early fault diagnosis of induction motor: a review study. Adv Mech Eng 2021; 13: 1–19.

Ren

Tang

, et al. DSmT-based three-layer method using multi-classifier to detect faults in hydraulic systems. Mech Syst Signal Process 2021; 153: 107513.

Shi

Ren

, et al. A new multisensor information fusion technique using processed images: algorithms and application on hydraulic components. IEEE Trans Instrum Meas 2022; 71: 1–12.

Cui

Xiao

Aircraft hydraulic pump performance trend prediction method based on ARIMA LSTM. J Vib Meas Diagn 2021; 4: 735–740.

10.

Dai

Chen

Dai

Application of singular value trend decomposition in prediction of fuel pump performance index. Transducer Microsyst Technol 2021; 40: 157–160.

11.

Kimera

Nangolo

FN.

Improving ship yard ballast pumps’ operations: a PCA approach to predictive maintenance. Marit Transp Res 2020; 1: 100003.

12.

Wang

Zheng

Xiang

Online bearing fault diagnosis using numerical simulation models and machine learning classifications. Reliab Eng Syst Saf 2023; 234: 109142.

13.

Chung

Park

Kang

Fault classification and timing prediction based on shipment inspection data and maintenance reports for semiconductor manufacturing equipment. Comput Ind Eng 2023; 176: 108972.

14.

Jia

Jeong

JH.

Deep learning for quantile regression under right censoring: DeepQuantreg. Comput Stat Data Anal 2022; 165: 107323.

15.

Gammelli

Rolsted

Pacino

, et al. Generalized multi-output Gaussian process censored regression. Pattern Recognit 2022; 129: 108751.

16.

Gao

Nonparametric method of estimating survival functions containing right-censored and interval-censored data. J Biomed Eng 2014; 31: 267–272.

17.

Canales

Wilson

Pearce-Walker

, et al. Methods for handling left-censored data in quantitative microbial risk assessment. Appl Environ Microbiol 2018; 84: 1–11.

18.

Shen

Zhang

, et al. An artificial neural network-based data filling approach for smart operation of digital wastewater treatment plants. Environ Res 2023; 224: 115549.

19.

Sharma

Rai

RN.

Failure modes based censored data analysis for repairable systems and its industrial perspective. Comput Ind Eng 2021; 158: 107439.

20.

Alrumayh

Weera

Khogeer

, et al. Optimal analysis of adaptive type-II progressive censored for new unit-lindley model. J King Saud Univ - Sci 2023; 35: 102462.

21.

Kong

Wang

, et al. A simplified approach for data filling in incomplete soft sets. Expert Syst Appl 2023; 213: 119248.

22.

Hsieh

Chen

YY.

Survival function estimation of current status data with dependent censoring. Stat Probab Lett 2020; 157: 108621.

23.

Mukherjee

Coad

Jana

Covariate-adjusted response-adaptive designs for censored survival responses. J Stat Plan Inference 2023; 225: 219–242.

24.

Rehman

Chandra

Jammalamadaka

SR.

Competing risks survival data under middle censoring—An application to COVID-19 pandemic. Heal Anal 2021; 1: 100006.

25.

Tashkandy

Almetwally

Ragab

, et al. Statistical inferences for the extended inverse Weibull distribution under progressive type-II censored sample with applications. Alex Eng J 2023; 65: 493–502.

26.

García-Mora

Debón

Santamaría

, et al. Modelling the failure risk for water supply networks with interval-censored data. Reliab Eng Syst Saf 2015; 144: 311–318.

27.

Zou

Composite quantile regression analysis of survival data with missing cause-of-failure information and its application to breast cancer clinical trial. Comput Stat Data Anal 2023; 182: 107711.

28.

Zhong

Wang

Chen

Censored mean variance sure independence screening for ultrahigh dimensional survival data. Comput Stat Data Anal 2021; 159: 107206.

29.

Pan

Feature screening and FDR control with knockoff features for ultrahigh-dimensional right-censored data. Comput Stat Data Anal 2022; 173: 107504.

30.

Hemant

Udaya

Eugene

, et al. Random survival forests. Ann Appl Stat 2008; 2: 1–22.

31.

Taylor

JM.

Random survival forests. J Thorac Oncol 2011; 6: 1974–1975.

32.

Ishwaran

Gerds

Kogalur

, et al. Random survival forests for competing risks. Biostatistics 2014; 15: 757–773.

33.

Xie

Ning

Yuan

, et al. AutoScore-Survival: developing interpretable machine learning-based time-to-event scores with right-censored survival data. J Biomed Inform 2022; 125: 103959.

34.

Chen

Guestrin

XGBoost: a scalable tree boosting system. In: KDD’16: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data, California, USA, 13–17 August 2016, pp.785–794. New York: ACM.

35.

Zhao

Yan

Wang

Fault diagnosis of wind turbine generator based on deep autoencoder network and XGBoost. Autom Electr Power Syst 2019; 43: 81–86.

36.

Chen

Gao

Using the motor power and XGBoost to diagnose working states of a sucker rod pump. J Pet Sci Eng 2021; 199: 108329.

37.

Zhang

Yang

Anomaly detection and diagnosis for wind turbines using long short-term memory-based stacked denoising autoencoders and XGBoost. Reliab Eng Syst Saf 2022; 222: 108445.

38.

Yuan

Chen

Xia

, et al. A novel feature susceptibility approach for a PEMFC control system based on an improved XGBoost-Boruta algorithm. Energy AI 2023; 12: 100229.

39.

Wang

Zhao

Application of XGBoost algorithm in prediction of wind motor main bearing fault. Electr Power Autom Equip 2019; 39: 73–77.

40.

Zhang

Qian

Mao

, et al. A data-driven design for fault detection of wind turbines using random forests and XGboost. IEEE Access 2018; 6: 21020–21031.

41.

González-Domínguez

Sánchez-Barroso

García-Sanz-Calcedo

, et al. Cox proportional hazards model used for predictive analysis of the energy consumption of healthcare buildings. Energy Build 2022; 257: 111784.

42.

Showkat

Singh

Perceiving moisture damage of asphalt mixes containing RAP using survival analysis based on Kaplan-Meier estimator and Cox proportional hazards model. Constr Build Mater 2022; 320: 126249.

43.

Sahal

Breslin

Ali

MI.

Big data and stream processing platforms for Industry 4.0 requirements mapping for a predictive maintenance use case. J Manuf Syst 2020; 54: 138–151.

RSF-based model for predicting pump failure trends in tunnels

Abstract

Keywords

Introduction

Methodology

Feature extraction based on Spearman correlation and VIMP

Data filling model based on XGBoost

Failure trend prediction model based on RSF

The calculation principle of RSF algorithm

Cumulative hazard function and failure rate λ * ( t | X i )

Pump Failure Trend Prediction Model

Data collection and analysis

Data collection

Feature analysis

Case study

Data preprocessing

Feature extraction

Data filling

Failure trend prediction model

Model performance evaluation

Data filling model

Failure prediction model

Discussion

Results analysis

Management insights

Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

Data availability

References

Cumulative hazard function and failure rate $λ^{*} (t | X_{i})$