Sage Journals: Discover world-class research

Abstract

Tread wear rates of the right and left wheels of a wheelset are not the same because of the complexity of the track condition, which causes the wheel diameter difference (WDD). The WDD can influence vehicle dynamic performances and shorten the service life of the wheelset. To diagnose and recognize the condition of the WDD in time, a data-driven method based on multi-sensor information fusion is proposed. Different statistical features are extracted from the time and frequency domains of the axle-box acceleration signals. The features can be fused by integrating stacked autoencoder and multiple kernel learning. The comparative experimental analysis shows that compared with other commonly used intelligent methods, the proposed method can achieve higher diagnostic accuracy and give better performance with small training sample sizes. The statistical features sensitive to the WDD are also analyzed for industrial application.

Keywords

Wheel diameter difference fault diagnosis autoencoder multiple kernel learning freight wagons

1. Introduction

In the field of railway transportation, freight wagons are widely used for transporting cargoes (Molatefi et al., 2019). For railway vehicles, the wheel diameter difference (WDD) occurs frequently in the long-term operation, due to the difference of wear rate between wheels of a wheelset. The increasing of WDD deteriorates the dynamic performance of railway vehicles and leads to rolling contact fatigue of wheel tread (Li et al., 2021b; Lyu et al., 2020). Therefore, it is important and necessary to monitor the condition of WDD accurately. The WDD of a wheelset can be obtained by comparing the diameter and profile of both wheels. In general, the measurement for the wheel diameter of railway vehicles can be classified into two categories: (1) static measurement and (2) dynamic measurement. For the static measurement, the railway vehicles are stationary and the measurement sensors are portable. At present, the condition monitoring for the WDD mainly depends on manual measurement regularly, which is time-consuming and unreliable. For the dynamic measurement, the measurement sensors are onboard or on the wayside. In recent decades, some dynamic and non-contact measurement methods are proposed (Torabi et al., 2018; Yan et al., 2014). However, the precision and robustness of the apparatus based on optics or machine-vision techniques are influenced by the environment, such as light, dust, and rain.

Traditional condition monitoring methods for railway vehicles are mainly based on time or frequency domain analysis of vibration signals (Chen et al., 2020, 2021; Lei and Wang, 2020; Ke et al., 2021; Shi et al., 2020). Different from other faults in railway, such as wheel flat and wheel polygon, there is not a definite characteristic frequency of the WDD under the interference of track random irregularity. Therefore, it is difficult to monitor and detect the WDD by traditional time and frequency analysis methods. At present, a lot of data-driven fault diagnosis methods have been developed based on a single sensor. Nevertheless, a single sensor commonly cannot monitor the condition of equipment because of the limited installation location and coverage range. To this end, to enhance diagnostic performance and fuse the information of multi-sensor data, some schemes of information fusion have been proposed (Li et al., 2021a; Jablon et al., 2021; Ma et al., 2021; Suj et al., 2021). These methods can mainly be divided into three forms: data-layer fusion, feature-layer fusion, and decision-layer fusion. Compared with the other strategies, the feature-layer fusion strategy has a better capability in fault tolerance and heterogeneous feature processing. Feature-layer fusion is mainly used to fuse the features extracted from the fault signals, which realizes the compression of a large amount of data.

Autoencoder (AE) (Wu et al., 2021), a kind of artificial neural network, is used to learn a representation by encoding the input data in the manner of unsupervised learning. Compared with other methods, AE has a more powerful capability of feature fusion and can obtain the deeper generalized expression of the input data. Kernel methods represent a well-established learning paradigm. Multiple kernel learning (MKL) replaces a single kernel by using a combination of multiple basic kernels (Nen et al., 2011). The nonlinear complex patterns of the input data can be captured with the combination kernel by mapping data into a higher dimensional space. By taking advantage of the feature fusion ability of stacked autoencoder (SAE) and pattern classification capacity of MKL, an intelligent fault diagnosis method is proposed and applied for the fault diagnosis of the WDD of freight wagons. The main contributions of this work can be summarized as follows:

1. For fault diagnosis of the WDD, a novel multi-sensor information fusion method is proposed by integrating SAE and MKL.

2. The proposed method is validated by the simulation and experimental data. Compared with some well-known methods, the proposed method improves the diagnosis performance of the WDD, especially in the case of small training sample size.

3. A detailed analysis is conducted about the application conditions of the proposed method, such as operation speed and sensitive features, which indicates that the proposed method has the capability of industrial application.

The organization of the paper is shown as follows. The basic theory is briefly introduced in Section 2. Section 3 shows the detail of the proposed method. In Section 4, the effectiveness of the proposed method is validated by simulation data and experimental data. Finally, general conclusions are given in Section 5.

2. Theoretical background

2.1. Autoencoder

The common architecture of autoencoder (AE) is a three-layer neural network. The AE consists of the input layer, the hidden layer, and the output layer. Compared with other layers, the hidden layer has less number of neurons. Therefore, the input data $x \in R^{n}$ can be transformed into a lower dimensional hidden representation $h \in R^{d}$ (d < n) as follows

\begin{array}{l} h = f^{[h]} (W^{[h]} x + b) \end{array}

(1)

where

f^{[h]} (\cdot)

b^{[h]}

, and

W^{[h]}

are the activation function, bias vector, and weight matrix, respectively.

The latter portion of AE transforms the features of the hidden layer into a reconstruction vector $\hat{x} \in R^{n}$ as follows

\begin{array}{l} \hat{x} = f^{[0]} (W^{[0]} x + b^{[0]}) \end{array}

(2)

where

f^{[o]} (\cdot)

b^{[o]}

, and

W^{[o]}

are activation function, bias vector, and weight matrix for the output layer, respectively.

The parameters of AE are trained by minimizing the mean squared error (MSE) loss. For a given set of data ${x_{i}}_{i = 1}^{m}$ , the cost function can be described as the average MSE loss as follows

\begin{array}{l} J_{M S E} (W, b) = \frac{1}{m} \sum_{i = 1}^{m} ({\frac{1}{2} ‖ x^{(i)} - {\hat{x}}^{(i)} ‖}^{2}) \end{array}

(3)

The reconstruction error can be represented by the binary cross-entropy loss (Wu et al., 2021), which can be expressed as follows

\begin{array}{l} J_{B C} (W, b) = - \frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} [x_{j}^{(i)} \log {\hat{x}}_{j}^{(i)} + (1 - x_{j}^{(i)}) \log (1 - {\hat{x}}_{j}^{(i)})] \end{array}

(4)

After the process of encoder and decoder of the AE, some more relative and essential features of the input data are expected to be extracted by the hidden layer. To handle more complex data, some improved methods based on the autoencoder have been proposed, such as stacked autoencoder (SAE), denoising autoencoder (DAE), and variational autoencoder (VAE). The SAE, a cascade of multiple AEs, is used to extract the layer-by-layer features (Sun et al., 2021). In the pre-training stage of SAE, the extracted features of the previous AE are input to the subsequent AE. The pre-training stage makes the initial value of the SAE network to be in a suitable state, which is convenient for the iterative convergence in the supervised stage. When the number of nodes in the hidden layer is less than those in the input layer, the input information can be fused and represented by the low-dimensional features.

2.2. Multiple kernel learning

The most frequently used kernel method is SVM. In general, a single kernel function of SVM is commonly used for a binary classification problem. The purpose of the SVM algorithm is to separate the different classes with the maximum extent, which leads to two separating planes parallel to the hyperplane. However, for the multi-classification task, the features projected by a single kernel function cannot be distinguishable between different classes. MKL has been proven to effectively improve the discriminant and generalization performance of the classifier. Within this framework, the problem of feature representation can be transferred to optimize the kernel combination weights for multiple kernel matrices. Consequently, MKL can achieve the multi-source information fusion by combining the kernel functions. Therefore, the MKL-based SVM (MKL-SVM) is defined to learn both the kernel combination weights and the decision boundaries in an optimization problem. The weight coefficient of each kernel function can be learned by combining the optimization objective of SVM.

3. Methodology

This section details the proposed fault diagnosis strategy for the WDD. Figure 1 shows the framework of the proposed method. Firstly, the vibration signals of the wheelset of different WDD are collected. Secondly, multiple domain features of the vibration signals are extracted and an ensemble feature-level fusion model is proposed by integrating the SAE and MKL-SVM. The model is pre-trained by the unlabeled data and fine-tuned by a small amount of labeled data. Finally, the extracted features are input to the proposed model for feature fusion and fault diagnosis.

Figure 1.

Framework of the proposed method.

3.1. Multi-domain feature extraction

Plentiful indicators have been proposed for the condition monitoring of rotating machinery. However, the indicator that focuses on condition monitoring of the WDD is few. The main reason is that most of the indicators aim at recognizing the impulse signal. The WDD, especially for a small WDD, will not cause obvious periodic wheel–rail impact vibration, which can hardly be distinguished in the frequency domain or time–frequency domain. Therefore, a wide range of indicators is selected as the statistical features in this study, as listed in Table 1. The originally collected signals are segmented into the same length data by sliding window. The data segmentations are further transformed into frequency domain, and the sample of the time and frequency domain features are extracted from the data segmentation. There is an overlap between the signal segmentations in order to obtain more samples. (The process can be found in Supplementary Figure S1 in the supplementary information file.)

Table 1.

Statistical features in the time and frequency domains.

Number	Time domain features	Number	Frequency domain features
1	$F_{1} = x_{\max} - x_{\min}$	15	$F_{15} = \frac{1}{M} \sum_{j = 1}^{M} y_{j}$
2	$F_{2} = {(\frac{1}{n} \sum_{i = 1}^{N} {(x_{i})}^{2})}^{1 / 2}$	16	$F_{16} = \frac{1}{M - 1} \sum_{j = 1}^{M} {(y_{j} - F_{15})}^{2}$
3	$F_{3} = F_{1} / \sqrt{\frac{1}{n} \sum_{i = 1}^{N} {(x_{i})}^{2}}$	17	$F_{17} = \frac{\sum_{j = 1}^{M} {(y_{j} - F_{15})}^{3}}{M (\sqrt{{F_{16}}^{3}})}$
4	$F_{4} = \frac{1}{n} \sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2}$	18	$F_{18} = \frac{\sum_{j = 1}^{M} {(y_{j} - F_{15})}^{4}}{M ({F_{16}}^{2})}$
5	$F_{5} = \frac{N \sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{4}}{{(\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2})}^{2}}$	19	$F_{19} = \frac{\sum_{j = 1}^{M} (f_{j} * y_{j})}{\sum_{j = 1}^{M} y_{j}}$
6	$F_{6} = F_{1} / (\frac{1}{n} \sum_{i = 1}^{N} (\| x_{i} \|)$	20	$F_{20} = \frac{1}{M} \sum_{j = 1}^{M} \sqrt{{(f_{j} - F_{19})}^{2} * y_{j}}$
7	$F_{7} = F_{2} / (\frac{1}{n} \sum_{i = 1}^{N} (\| x_{i} \|)$	21	$F_{21} = \frac{\sqrt{\sum_{j = 1}^{M} ({f_{j}}^{2} * y_{j})}}{\sum_{j = 1}^{M} y_{j}}$
8	$F_{8} = \frac{1}{n} \sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{3}$	22	$F_{22} = \frac{\sqrt{\sum_{j = 1}^{M} ({f_{j}}^{4} * y_{j})}}{\sum_{j = 1}^{M} ({f_{j}}^{2} * y_{j})}$
9	$F_{9} = \frac{N \sum_{i = 1}^{N} {(r e_{i} - \bar{r e})}^{4}}{{(\sum_{i = 1}^{N} {(r e_{i} - \bar{r e})}^{2})}^{2}}$	23	$F_{23} = \frac{\sum_{j = 1}^{M} ({f_{j}}^{2} * y_{j})}{\sqrt{\sum_{j = 1}^{M} ({f_{j}}^{4} * y_{j})} * \sum_{j = 1}^{M} (y_{j})}$
10	$F_{10} = \frac{1}{n} \sum_{i = 1}^{N} \| x_{i} \|$	24	$F_{24} = F_{20} / F_{19}$
11	$F_{11} = F_{1} / (\frac{1}{n} \sum_{i = 1}^{N} \| x_{i} \|)$	25	$F_{25} = \frac{1}{M * {F_{20}}^{3}} \sum_{j = 1}^{M} {(f_{j} - F_{19})}^{3} * y_{j}$
12	$F_{12} = - \sum_{i = 1}^{N} p (x_{i}) \log (p (x_{i})))$	26	$F_{26} = \frac{1}{M * {F_{20}}^{4}} \sum_{j = 1}^{M} {(f_{j} - F_{19})}^{4} * y_{j}$
13	$F_{13} = \ln ϕ^{m} (t) - \ln ϕ^{m + 1} (t)$	27	$F_{27} = \frac{1}{M * \sqrt{F_{20}}} \sum_{j = 1}^{M} \sqrt{\| f_{j} - F_{19} \|} * y_{j}$
14	$F_{14} = \ln ϕ^{m} (t) - \ln ϕ^{m + 1} (t)$

where $x_{i}$ is a signal series and N is the number of data points, $\bar{x}$ is the average of $x_{i}$ , $r e_{i}$ equals ${x_{i}}^{2} - x_{i - 1} x_{i + 1}$ , $\bar{r e}$ is the average of $r e_{i}$ , $p (x_{i})$ is the probability of $x_{i}$ , and $y_{i}$ is the frequency value of the i-th spectrum line.

3.2. Feature fusion based on SAE and MKL-SVM

The training process of SAE consists of two steps: unsupervised pre-training and supervised fine-tuning (Li et al., 2020). In the pre-training stage, each AE is trained separately. By minimizing the reconstruction error by layer-by-layer, the hidden features can be extracted unsupervised. In the fine-tuning stage, only a small number of labeled data is adopted for the classifier in the manner of supervised learning. Figure 2 illustrates the typical learning process of the SAE. For the l-th layer of SAE, the encoder produces the feature $h_{l}$ according to equation (5). The decoder generates a reconstruction from $h_{l}$ to the input signal $h_{l - 1}$ of the l-th layer according to equation (6). In the pre-training stage, the features can be obtained by minimizing the reconstruction error which is expressed in equation (7).

\begin{array}{l} h_{l} = f (W_{l, 1} h_{l - 1} + b_{l, 1}) \end{array}

(5)

\begin{array}{l} {\hat{h}}_{l - 1} = \hat{f} (W_{l, 2} h_{l} + b_{l, 2}) \end{array}

(6)

\begin{array}{l} J = \sum \sum {‖ {\hat{h}}_{l - 1}^{j} - h_{l - 1}^{j} ‖}^{2} / 2 N_{s} \end{array}

(7)

where

W_{l, 1}

and

W_{l, 2}

are the weight matrices,

b_{l, 1} \in R_{l}^{d}

and

b_{l, 2} \in R_{l - 1}^{d}

are the bias vectors, and

f (\cdot)

and

\hat{f} (\cdot)

are the activation functions.

Figure 2.

Architecture and training process of the SAE.

A sparse constraint based on the Kullback–Leibler divergence (Vincent et al., 2010) is introduced to the SAE to avoid overfitting, as expressed in equations (8) and (9)

\begin{array}{l} K L (ρ ‖ {\hat{ρ}}_{l}) = ρ \log \frac{ρ}{{\hat{ρ}}_{l}} + (1 - ρ \log \frac{1 - ρ}{1 - {\hat{ρ}}_{l}}) \end{array}

(8)

\begin{array}{l} {\hat{ρ}}_{l} = \sum_{j = 1}^{N_{s}} h_{l} (x^{j}) / N_{s} \end{array}

(9)

where ρ and

{\hat{ρ}}_{l}

are the sparsity parameter and average activation, respectively.

The reconstruction error can be described as

\begin{array}{l} J_{s} = J + β K L (ρ ‖ {\hat{ρ}}_{l}) \end{array}

(10)

where β is the coefficient of sparsity weight.

For the fine-tuning process of SAE, the hidden layers are initialized using the weight and bias generated by pre-training, which can avoid the disappearance or explosion of gradient compared with the random initialization. The loss function of the classifier is shown in equation (11).

\begin{array}{l} J_{s} = - \sum_{i} y_{i} \log (p_{i}) \end{array}

(11)

where

p_{i}

is the posterior probability of the i-th sample.

For the problem of binary classification, suppose that we are given the samples: ${(x_{i}, y_{i})}_{i = 1}^{n}$ ( $x_{i} \in X$ is the input data and $y_{i} \in {+ 1, - 1}$ is the label). The optimal hyperplane can be got through solving the following optimization problem

\begin{array}{l} {\begin{cases} \min \frac{1}{2} {‖ w ‖}^{2} + C \sum_{i = 1}^{n} ξ_{i} \\ s . t . y_{i} (w^{T} ϕ (x_{i}) + b) \geq 1 - ξ_{i} \\ ξ_{i} \geq 0, i = 1, \dots, n \end{cases} \end{array}

(12)

where C is the regularization parameter and ξ =

{(ξ_{1}, ξ_{2}, \cdot \cdot \cdot, ξ_{n})}^{T}

is the vector of slack variables. Suppose that

α_{i}

is the Lagrange multiplier for the i-th inequality in equation (12), and the dual problem of equation (12) can be expressed as follows

\begin{array}{l} {\begin{cases} \max \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} y_{i} y_{j} α_{i} α_{j} k (x_{i}, x_{j}) \\ s . t . \sum_{i = 1}^{n} α_{i} y_{i} = 0 \\ 0 \leq α_{i} \leq C, i = 1, \dots, n \end{cases} \end{array}

(13)

where

k (x_{i}, x_{j}) = φ (x_{i}) \cdot φ (x_{j})

. After obtaining the solution of equation (13), the decision function of SVM can be described as follows

\begin{array}{l} f (x) = s g n (\sum_{i = 1}^{n} α_{i} y_{i} k (x_{i}, x_{j}) + b) \end{array}

(14)

where the samples

x_{i}

are the support vectors.

Different from the optimization criterion with only a fixed kernel, MKL constructs the kernel k by combining a set of the predefined kernels (Bucak et al., 2014; Wang et al., 2021). The feature can take the form of $φ (\cdot) = {[φ_{1}^{T} (\cdot), \cdot \cdot \cdot, φ_{M}^{T} (\cdot)]}^{T}$ that is generated by $M$ predefined base kernels ${k_{m} (\cdot, \cdot)}_{m = 1}^{M}$ . The predefined base kernels can be built from a specific kernel with different parameters or different types of kernels. The linear combination of these kernels can be described by

\begin{array}{l} k = \sum_{m = 1}^{M} μ_{m} k_{m} \end{array}

(15)

where

μ_{m}

is the combination coefficient. Suppose

μ = {(μ_{1}, \cdot \cdot \cdot, μ_{m})}^{T}

, an additional constraint is imposed by taking the norm of μ as follows

\begin{array}{l} {‖ μ ‖}_{p} = {(\sum_{m = 1}^{M} {| μ_{m} |}^{p})}^{1 / p} = 1 \end{array}

(16)

Finally, the decision function for MKL-SVM can be expressed by

\begin{array}{l} f (x) = s g n (\sum_{i = 1}^{n} α_{i} y_{i} \sum_{m = 1}^{M} μ_{m} k_{m} (x_{i}, x_{j}) + b) \end{array}

(17)

In the MKL-SVM, there are three parameters: weight parameter, penalty constant, and kernel parameters. The simple MKL algorithm (Rakotomamonjy et al., 2008) based on $L_{2}$ -norm of kernel weights is adopted in this paper.

4. Simulation and experimental verification

4.1. Simulation validation

The effect of the WDD is analyzed in the view of dynamics, and the effectiveness of the proposed method is validated by simulation data firstly. The dynamics model of freight wagon is developed by the multi-body simulation software SIMPACK (The structure and detailed parameters of the model can be found in Supplementary Figure S2 and Supplementary Table S1 in the supplementary information file.). The main parameters of the dynamic model are given in Table 2. Kalker’s FASTSIM algorithm (Enblom et al., 2016) is adopted to solve the tangential creep force of wheel–rail contact. The WDD is only set for the front wheelset of the front bogie. The axle-box acceleration of the right wheel with a small diameter is collected. The WDDs are from 0 mm to 4 mm at the interval of 1 mm. The dynamics model is simulated at a constant speed of 80 km/h on the straight track and the random excitation of the track is the American fifth-grade track irregularity.

Table 2.

Main parameters of freight wagons.

Parameters	Values	Units	Parameters	Values	Units
Carbody mass	10297	Kg	Primary longitudinal stiffness	11 × 10⁶	MN/m
Bogie mass	497	Kg	Primary lateral stiffness	13 × 10⁶	MN/m
Wheelset mass	1171	Kg	Primary vertical stiffness	160 × 10⁶	MN/m
Bolster mass	745	Kg	Secondary longitudinal stiffness	1.818 × 10⁶	MN/m
Wheelset diameter	0.84	m	Secondary lateral stiffness	1.818 × 10⁶	MN/m
Bogie distance	8.2	m	Secondary vertical stiffness	2.233 × 10⁶	MN/m

The axle-box lateral and vertical accelerations of different WDDs are illustrated in Figure 3(a) and (b). The amplitudes of the impulse signals excited by random irregularity increase with the increasing of WDD. For the vertical acceleration, the difference between the signals of different WDDs is relatively small. Figure 4 presents the wheel/rail contact position of different WDDs. The result indicates that the contact position moves forward to the flange with the increase in WDD for the small wheel and vice versa for the large wheel. The equivalent conicity is higher in the zone close to the flange, which aggravates the lateral vibration of vehicles. Because of the automatic balance characteristic of the wheelset during vehicle operation, the lateral displacement of the wheelset continues to increase as the WDD is increasing constantly. When the wheelset with a large WDD moves forward at a high speed, the flange will be collided with the rail frequently, resulting in severe impact and influencing the safety of railway vehicles.

Figure 3.

Axle-box lateral acceleration of different wheel diameter differences: (a) lateral and (b) vertical.

Figure 4.

Wheel/rail contact position: (a) right wheel: small diameter and (b) left wheel: large diameter.

There are 365 (73 for each class of WDDs) samples of the dataset based on the simulation signals. The main predefined hyperparameters used in the model are listed in Table 3. Sensor 1 is used to measure the lateral acceleration and sensor 2 is used to measure the vertical acceleration. Figure 5(a)–(c) illustrates the t-SNE representations of the fused feature samples of sensor 1, sensor 2, and sensor 1&2, respectively. It can be found that the samples of a single sensor are relatively concentrated, but some samples are still difficult to separate. The feature samples fusing the information of multiple sensors can be effectively separated.

Table 3.

Main hyperparameters of the model.

Hyperparameter	Value	Hyperparameter	Value
Network structure parameters	27-20-10-5	Kernel function	Gaussian
Learning rate	0.1	Penalty parameter	6.21
Momentum	0.3	Kernel function parameter	3.27
Iteration number	200	Tradeoff parameter	0.52
Maximum epochs	100	Data length of sliding window	200
Sparsity parameter	0.03	Overlap of the data segmentation	100

Figure 5.

Visualization of the fused features: (a) sensor 1; (b) sensor 2; and (c) sensor 1&2.

In the learning process of multi-sensor fusion, the distributions of feature samples are illustrated in Figure 6. Figure 6(a) gives the 2D representations of raw features without feature fusion. Figure 6(b)–(e) shows the 2D representations of the encoded features given by four hidden layers ( SAE1, SAE2, SAE3, and SAE4). Figure 6(f) illustrates the distribution of the final output features fused by SAE and MKL. It can be found that the representations of raw features are extreme disorder and cannot be separated directly. The difference between the features of different WDDs is not obvious. The distribution of the features extracted by the encoder is more organized, and the features of the same WDD condition cluster closely. Finally, all the output features of different WDD conditions are separated clearly. This good separation of features means a more accurate performance in fault diagnosis of the WDD. The results reveal that the proposed method has a strong capacity of learning discriminative representations from the input data. The confusion matrix of the proposed method is given in Figure 7. As can be seen, the accuracy based on the multi-sensor information fusion method is 97.3%. These results show that the proposed method can capture the representative features and distinguish the WDD.

Figure 6.

Two-dimensional features visualization: (a) raw feature, (b) SAE1, (c) SAE2, (d) SAE3, (e) SAE4, and (f) output features. SAE: stacked autoencoder.

Figure 7.

Confusion matrix based on simulation data.

4.2. Experimental analysis

Three freight wagons with different WDDs (0.8 mm, 2.1 mm, and 4 mm) are measured, and the acceleration sensors are mounted on their axle-boxes, as shown in Figure 8. The lateral and vertical axle-box acceleration is measured, and the sampling frequency is 2000 Hz. During the process of the experiment, the freight wagons are accelerated to 75 km/h and run at a constant speed and then slowed down to 0 km/h. There are total 7000 samples extracted from the acceleration signals for each wagon, including 800 samples for acceleration and 900 samples for deceleration.

Figure 8.

Field measurement: (a) location of sensors and (b) speed.

For the wagons with different WDDs, the raw acceleration signals of the axle-box are used to calculate and extract statistical features as mentioned above (The distribution of features of different WDDs can be found in Supplementary Figure S3 in the supplementary information file). It can be found that there is a significant difference in the distribution of features. Some features, such as peak–peak value ( $F_{1}$ ), RMS ( $F_{2}$ ), sample entropy ( $F_{13}$ ), and fuzzy entropy ( $F_{14}$ ), increase with the increasing of speed. Shape factor ( $F_{7}$ ) and skewness ( $F_{8}$ ) are less sensitive to the speed of freight wagon. There is not a clear distinction between the features of different WDDs. Therefore, an intelligent identification and diagnosis method of the WDD is necessary.

4.2.1. Effect of operation speed

To analyze the effect of operation speed, the features extracted at different operation speeds are employed to recognize the WDD condition of freight wagons. For the whole process of the experiment, a span of 500 samples is input into the model at each time. The diagnostic performance based on multi-sensor fusion is compared with that based on a single sensor, as shown in Figure 9. The results indicate that the classification error of the samples at the constant speed condition is less than that at the acceleration or deceleration condition. The minimum classification error is 7.61% which is given by multi-sensor fusion. It should be noticed that the features extracted from the vibration signals are influenced not only by operation speed but also by complicated track conditions. Consequently, there are some differences in the classification errors for the samples collected at the same operation speed. The samples in the range of 2000–3000 that give the best classification performance are selected for further study.

Figure 9.

Influence of the speed on the wheel diameter difference diagnosis.

4.2.2. Sensitive feature analysis

In general, a better classification performance and computational efficiency can be obtained by discarding irrelevant features. Figure 10(a) gives the importance of features which are evaluated by the ReliefF algorithm (Huang, 2012). Figure 10(b) shows the performance evolution with different features. The proposed multi-sensor fusion fault diagnosis method gives the classification accuracy of 95.15% with eight ranked features. The classification accuracy of the model using information of a single sensor is 93.37% (18 ranked features) and 91.74% (19 ranked features), respectively. The results reveal that the fault diagnosis based on the information of a single sensor is limited for recognizing the WDD condition. The proposed method can offer better performance with fewer features, which greatly improves the computational efficiency.

Figure 10.

Influence of features on the performance: (a) feature importance and (b) performance evolution.

4.2.3. Performance comparison

To prove the superiority of the proposed method, several related methods are applied for comparison, including convolutional neural networks (CNNs), SAE, MKL-SVM, and LibSVM. It should be noticed that, at present, the research focused on the diagnosis and recognition of the wheel diameter difference is relatively less. These methods are designed and implemented by ourselves, and the main parameters are set as consistent as possible to ensure the reliability of comparison. The parameters are described as follows:

1. CNN: The network structure of the CNN includes one convolutional layer, one pooling layer, one fully connected layer, and one softmax layer. The structure is selected as 27-20-10-3 which is consistent with the proposed method. The iteration number, momentum, and learning rate are 800, 0.6, and 0.15, respectively.

2. SAE: The network structure parameters are 27-20-10-3. The learning rate and momentum for each AE are 0.3 and 0.1, respectively. The training iteration number is 80 and fine-tuning iteration is 50.

3. MKL-SVM: All datasets are represented by the Gaussian kernel. Taking advantage of the grid search method, the weight parameter is optimized in the span of [0, 1]. The penalty constant and kernel parameters are optimized in the span of [0, 100].

4. LibSVM: The Gaussian kernel is adopted. The radius of the kernel function and the penalty factor is optimized by the grid search method in the span of [0, 10].

For each method, the fault diagnosis trial is repeated 10 times, and the trial on the dataset with a sample size of 1000 is performed. In each trial, 90% of all samples are randomly selected to train the model and the remaining 10% samples are used to test. The detailed classification performance of several methods mentioned above is illustrated in Figure 11 and listed in Table 4. For all methods, the fault diagnosis based on information fusion of multi-sensors gives the best performance. In other words, it is lacking and lopsided to diagnose the WDD condition based on the information of a single sensor. Compared with other methods, the proposed method fusing the information of axle-box lateral and vertical acceleration produces the highest level of diagnosis accuracy (95.2% ± 2.8%). Therefore, the proposed multi-sensor fusion method can improve the performance of fault diagnosis for the WDD.

Figure 11.

Comparison of accuracy of various methods.

Table 4.

Average accuracy and standard deviation of various methods over 10 trials.

Sensor	Average accuracy (±SD)
Sensor	CNN	SAE	MKL-SVM	LibSVM	Proposed
1	86.6 (± 3.2)	83.3 (± 5.3)	82.1 (± 5.7)	79.8 (± 5.3)	89.0 (± 2.3)
2	82.0 (± 5.1)	81.3 (± 7.4)	80.1 (± 6.2)	78.7 (± 7.0)	84.2 (± 4.5)
1&2	91.0 (± 3.7)	89.2 (± 3.8)	90.2 (± 4.2)	85.6 (± 4.8)	95.2 (± 2.8)

It can be found that the proposed method gives the better performance (bold values) based on a single sensor or multi-sensor.

To prove the superiority of the proposed method, the classification accuracies of different methods with various numbers of training samples are evaluated. Figure 12 illustrates the fault diagnosis accuracy for the datasets in different sample sizes. For all comparative methods, the results show that the diagnosis accuracy tends to decrease as the sample size decreases. The main reason is that the finite training samples cannot provide enough fault information to train the classifier. It can be found that the proposed method performs better than other methods for different training sample sizes. Moreover, for small training sample size, the superiority of the proposed method is more significant than other methods. The area under the curve (AUC) of the proposed method can achieve 0.961, 0.962, and 0.992 for the WDDs of 0.8 mm, 2.1 mm, and 4 mm, respectively (The ROC curves of different methods can be found in Supplementary Figure S4 in the supplementary information file.). It can be concluded that the proposed method can effectively reduce the false positives.

Figure 12.

Performance of different methods with different training sample sizes.

5. Conclusions

Aiming at the WDD of railway freight wagons, a novel method of fault diagnosis is proposed in this paper, which can fuse the information of multiple sensors and improve the recognition accuracy. The effectiveness of the proposed method is verified by the simulation and experimental datasets. The main conclusions are drawn as follows:

1. A multi-sensor fusion fault diagnosis method of the WDD is proposed based on SAE and MKL-SVM. Compared with other well-used intelligent methods, the proposed method has a better feature learning ability and classification performance.

2. The sensitive features for the WDD diagnosis are analyzed. The recognition accuracy based on multi-sensor information fusion offers better performance with fewer features than that based on a single sensor, which greatly enhances the computational efficiency.

3. The diagnosis accuracy decreases when the size of the training samples decreases for all methods. The proposed method gives a better performance than other methods for different training sample sizes, and the superiority of the proposed method is more significant for small training sample sizes.

Supplemental Material

Supplemental Material - Fault diagnosis method of wheel diameter difference based on stacked autoencoder and multiple kernel learning

Supplementary Material for Fault diagnosis method of wheel diameter difference based on stacked autoencoder and multiple kernel learning by Shunqi Sui, Kaiyun Wang, Liang Ling, Shiqian Chen, and Bo Xie in Journal of Vibration and Control.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China (Grant No. 51825504, 51735012) and the Sichuan Science and Technology Program (Grant No. 2020YJ0213).

ORCID iD

Shunqi Sui

Supplemental Material

Supplemental material for this article is available online.

References

Bucak

Jin

Jain

(2014) Multiple kernel learning for visual object recognition: a review. Pattern Analysis and Machine Intelligence 36(7): 1354–1369.

Chen

Wang

Chang

, et al. (2021) A two-level adaptive chirp mode decomposition method for the railway wheel flat detection under variable-speed conditions. Journal of Sound and Vibration 498(4): 115963.

Chen

Sun

(2020) Vibration of axle box from wheel diameter difference in vehicle. Journal of Shanghai Jiaotong University(Science) 25(4): 509–518.

Enblom

Sichani

Berg

, et al. (2016) An alternative to FASTSIM for tangential solution of the wheel-rail contact. Vehicle System Dynamics: International Journal of Vehicle Mechanics and Mobility 54(5/6): 748–764.

Huang

(2012) Multi-label ReliefF and F-statistic feature selections for image annotation. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, 16–21 June 2012. IEEE, pp. 2352–2359.

Jablon

Avila

Borba

, et al. (2021) Diagnosis of rotating machine unbalance using machine learning algorithms on vibration orbital features. Journal of Vibration and Control 27(3–4): 468–476.

Yao

Song

, et al. (2021) Intelligent fault diagnosis method for common rail injectors based on hierarchical weighted permutation entropy and pair-wise feature proximity feature selection. Journal of Vibration and Control 28: 2386–2398. DOI: 10.1177/10775463211010521

Lei

Wang

(2020) Contact and creep characteristics of wheel–rail system under harmonic corrugation excitation. Journal of Vibration and Control 27(17–18): 2069–2080.

Zhu

, et al. (2021b) Influence of wheel profile wear coupled with wheel diameter difference on the dynamic performance of subway vehicles. Shock and Vibration 2021(8): 1–15.

10.

Zhong

Shao

, et al. (2021a) Multi-sensor gearbox fault diagnosis by using feature-fusion covariance matrix and multi-riemannian kernel ridge regression. Reliability Engineering and System Safety 216: 108018.

11.

Tian

Jiang

, et al. (2020) Distributed-ensemble stacked autoencoder model for non-linear process monitoring. Information Sciences 542: 302–316.

12.

Lyu

Wang

Ling

, et al. (2020) Influence of wheel diameter difference on surface damage for heavy-haul locomotive wheels: measurements and simulations. International Journal of Fatigue 132: 105343.

13.

Jia

, et al. (2021) Multi-vibration information fusion for detection of HVCB faults using CART and D-S evidence theory. ISA Transactions 113: 210–221.

14.

Molatefi

Mazraeh

Shadfar

, et al. (2019) Advances in Iran railway wheel wear management: a practical approach for selection of wheel profile using numerical methods and comprehensive field tests. Wear 424–425: 97–110.

15.

Nen

Alpayd Ethem

(2011) Multiple kernel learning algorithms. Journal of Machine Learning Research 12: 2211–2268.

16.

Rakotomamonjy

Bach

Canu

, et al. (2008) Simplemkl. Journal of Machine Learning Research 9: 2491–2521.

17.

Shi

Lin

, et al. (2020) Ensemble empirical mode decomposition-entropy and feature selection for pantograph fault diagnosis. Journal of Vibration and Control 26(23–24): 2230–2242.

18.

Sun

Wang

Liu

, et al. (2021) Stack autoencoder transfer learning algorithm for bearing fault diagnosis based on class separation and domain fusion. IEEE Transactions on Industrial Electronics 69(3): 3047–3058.

19.

Suj

Ydl

Isk

(2021) A distributed sensor-fault detection and diagnosis framework using machine learning. Information Sciences 547: 777–796.

20.

Torabi

Mousavi

Younesian

(2018) A high accuracy imaging and measurement system for wheel diameter inspection of railroad vehicles. IEEE Transactions on Industrial Electronics 65(10): 8239–8249.

21.

Vincent

Larochelle

Lajoie

, et al. (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11(12): 3371–3408.

22.

Wang

Zhang

(2021) Bridging deep and multiple kernel learning: a review. Information Fusion 67(2): 3–13.

23.

Zhang

Cheng

, et al. (2021) A hybrid classification autoencoder for semi-supervised fault diagnosis in rotating machinery. Mechanical Systems and Signal Processing 149: 107327.

24.

Yan

Feng

Cui

(2014) A simple method for dynamically measuring the diameters of train wheels using a one-dimensional laser displacement transducer. Optics and Lasers in Engineering 53: 158–163.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.25 MB