Abstract
Tread wear rates of the right and left wheels of a wheelset are not the same because of the complexity of the track condition, which causes the wheel diameter difference (WDD). The WDD can influence vehicle dynamic performances and shorten the service life of the wheelset. To diagnose and recognize the condition of the WDD in time, a data-driven method based on multi-sensor information fusion is proposed. Different statistical features are extracted from the time and frequency domains of the axle-box acceleration signals. The features can be fused by integrating stacked autoencoder and multiple kernel learning. The comparative experimental analysis shows that compared with other commonly used intelligent methods, the proposed method can achieve higher diagnostic accuracy and give better performance with small training sample sizes. The statistical features sensitive to the WDD are also analyzed for industrial application.
1. Introduction
In the field of railway transportation, freight wagons are widely used for transporting cargoes (Molatefi et al., 2019). For railway vehicles, the wheel diameter difference (WDD) occurs frequently in the long-term operation, due to the difference of wear rate between wheels of a wheelset. The increasing of WDD deteriorates the dynamic performance of railway vehicles and leads to rolling contact fatigue of wheel tread (Li et al., 2021b; Lyu et al., 2020). Therefore, it is important and necessary to monitor the condition of WDD accurately. The WDD of a wheelset can be obtained by comparing the diameter and profile of both wheels. In general, the measurement for the wheel diameter of railway vehicles can be classified into two categories: (1) static measurement and (2) dynamic measurement. For the static measurement, the railway vehicles are stationary and the measurement sensors are portable. At present, the condition monitoring for the WDD mainly depends on manual measurement regularly, which is time-consuming and unreliable. For the dynamic measurement, the measurement sensors are onboard or on the wayside. In recent decades, some dynamic and non-contact measurement methods are proposed (Torabi et al., 2018; Yan et al., 2014). However, the precision and robustness of the apparatus based on optics or machine-vision techniques are influenced by the environment, such as light, dust, and rain.
Traditional condition monitoring methods for railway vehicles are mainly based on time or frequency domain analysis of vibration signals (Chen et al., 2020, 2021; Lei and Wang, 2020; Ke et al., 2021; Shi et al., 2020). Different from other faults in railway, such as wheel flat and wheel polygon, there is not a definite characteristic frequency of the WDD under the interference of track random irregularity. Therefore, it is difficult to monitor and detect the WDD by traditional time and frequency analysis methods. At present, a lot of data-driven fault diagnosis methods have been developed based on a single sensor. Nevertheless, a single sensor commonly cannot monitor the condition of equipment because of the limited installation location and coverage range. To this end, to enhance diagnostic performance and fuse the information of multi-sensor data, some schemes of information fusion have been proposed (Li et al., 2021a; Jablon et al., 2021; Ma et al., 2021; Suj et al., 2021). These methods can mainly be divided into three forms: data-layer fusion, feature-layer fusion, and decision-layer fusion. Compared with the other strategies, the feature-layer fusion strategy has a better capability in fault tolerance and heterogeneous feature processing. Feature-layer fusion is mainly used to fuse the features extracted from the fault signals, which realizes the compression of a large amount of data.
Autoencoder (AE) (Wu et al., 2021), a kind of artificial neural network, is used to learn a representation by encoding the input data in the manner of unsupervised learning. Compared with other methods, AE has a more powerful capability of feature fusion and can obtain the deeper generalized expression of the input data. Kernel methods represent a well-established learning paradigm. Multiple kernel learning (MKL) replaces a single kernel by using a combination of multiple basic kernels (Nen et al., 2011). The nonlinear complex patterns of the input data can be captured with the combination kernel by mapping data into a higher dimensional space. By taking advantage of the feature fusion ability of stacked autoencoder (SAE) and pattern classification capacity of MKL, an intelligent fault diagnosis method is proposed and applied for the fault diagnosis of the WDD of freight wagons. The main contributions of this work can be summarized as follows: 1. For fault diagnosis of the WDD, a novel multi-sensor information fusion method is proposed by integrating SAE and MKL. 2. The proposed method is validated by the simulation and experimental data. Compared with some well-known methods, the proposed method improves the diagnosis performance of the WDD, especially in the case of small training sample size. 3. A detailed analysis is conducted about the application conditions of the proposed method, such as operation speed and sensitive features, which indicates that the proposed method has the capability of industrial application.
The organization of the paper is shown as follows. The basic theory is briefly introduced in Section 2. Section 3 shows the detail of the proposed method. In Section 4, the effectiveness of the proposed method is validated by simulation data and experimental data. Finally, general conclusions are given in Section 5.
2. Theoretical background
2.1. Autoencoder
The common architecture of autoencoder (AE) is a three-layer neural network. The AE consists of the input layer, the hidden layer, and the output layer. Compared with other layers, the hidden layer has less number of neurons. Therefore, the input data
The latter portion of AE transforms the features of the hidden layer into a reconstruction vector
The parameters of AE are trained by minimizing the mean squared error (MSE) loss. For a given set of data
The reconstruction error can be represented by the binary cross-entropy loss (Wu et al., 2021), which can be expressed as follows
After the process of encoder and decoder of the AE, some more relative and essential features of the input data are expected to be extracted by the hidden layer. To handle more complex data, some improved methods based on the autoencoder have been proposed, such as stacked autoencoder (SAE), denoising autoencoder (DAE), and variational autoencoder (VAE). The SAE, a cascade of multiple AEs, is used to extract the layer-by-layer features (Sun et al., 2021). In the pre-training stage of SAE, the extracted features of the previous AE are input to the subsequent AE. The pre-training stage makes the initial value of the SAE network to be in a suitable state, which is convenient for the iterative convergence in the supervised stage. When the number of nodes in the hidden layer is less than those in the input layer, the input information can be fused and represented by the low-dimensional features.
2.2. Multiple kernel learning
The most frequently used kernel method is SVM. In general, a single kernel function of SVM is commonly used for a binary classification problem. The purpose of the SVM algorithm is to separate the different classes with the maximum extent, which leads to two separating planes parallel to the hyperplane. However, for the multi-classification task, the features projected by a single kernel function cannot be distinguishable between different classes. MKL has been proven to effectively improve the discriminant and generalization performance of the classifier. Within this framework, the problem of feature representation can be transferred to optimize the kernel combination weights for multiple kernel matrices. Consequently, MKL can achieve the multi-source information fusion by combining the kernel functions. Therefore, the MKL-based SVM (MKL-SVM) is defined to learn both the kernel combination weights and the decision boundaries in an optimization problem. The weight coefficient of each kernel function can be learned by combining the optimization objective of SVM.
3. Methodology
This section details the proposed fault diagnosis strategy for the WDD. Figure 1 shows the framework of the proposed method. Firstly, the vibration signals of the wheelset of different WDD are collected. Secondly, multiple domain features of the vibration signals are extracted and an ensemble feature-level fusion model is proposed by integrating the SAE and MKL-SVM. The model is pre-trained by the unlabeled data and fine-tuned by a small amount of labeled data. Finally, the extracted features are input to the proposed model for feature fusion and fault diagnosis. Framework of the proposed method.
3.1. Multi-domain feature extraction
Statistical features in the time and frequency domains.
where
3.2. Feature fusion based on SAE and MKL-SVM
The training process of SAE consists of two steps: unsupervised pre-training and supervised fine-tuning (Li et al., 2020). In the pre-training stage, each AE is trained separately. By minimizing the reconstruction error by layer-by-layer, the hidden features can be extracted unsupervised. In the fine-tuning stage, only a small number of labeled data is adopted for the classifier in the manner of supervised learning. Figure 2 illustrates the typical learning process of the SAE. For the l-th layer of SAE, the encoder produces the feature Architecture and training process of the SAE.

A sparse constraint based on the Kullback–Leibler divergence (Vincent et al., 2010) is introduced to the SAE to avoid overfitting, as expressed in equations (8) and (9)
The reconstruction error can be described as
For the fine-tuning process of SAE, the hidden layers are initialized using the weight and bias generated by pre-training, which can avoid the disappearance or explosion of gradient compared with the random initialization. The loss function of the classifier is shown in equation (11).
For the problem of binary classification, suppose that we are given the samples:
Different from the optimization criterion with only a fixed kernel, MKL constructs the kernel k by combining a set of the predefined kernels (Bucak et al., 2014; Wang et al., 2021). The feature can take the form of
Finally, the decision function for MKL-SVM can be expressed by
In the MKL-SVM, there are three parameters: weight parameter, penalty constant, and kernel parameters. The simple MKL algorithm (Rakotomamonjy et al., 2008) based on
4. Simulation and experimental verification
4.1. Simulation validation
Main parameters of freight wagons.
The axle-box lateral and vertical accelerations of different WDDs are illustrated in Figure 3(a) and (b). The amplitudes of the impulse signals excited by random irregularity increase with the increasing of WDD. For the vertical acceleration, the difference between the signals of different WDDs is relatively small. Figure 4 presents the wheel/rail contact position of different WDDs. The result indicates that the contact position moves forward to the flange with the increase in WDD for the small wheel and vice versa for the large wheel. The equivalent conicity is higher in the zone close to the flange, which aggravates the lateral vibration of vehicles. Because of the automatic balance characteristic of the wheelset during vehicle operation, the lateral displacement of the wheelset continues to increase as the WDD is increasing constantly. When the wheelset with a large WDD moves forward at a high speed, the flange will be collided with the rail frequently, resulting in severe impact and influencing the safety of railway vehicles. Axle-box lateral acceleration of different wheel diameter differences: (a) lateral and (b) vertical. Wheel/rail contact position: (a) right wheel: small diameter and (b) left wheel: large diameter.

Main hyperparameters of the model.

Visualization of the fused features: (a) sensor 1; (b) sensor 2; and (c) sensor 1&2.
In the learning process of multi-sensor fusion, the distributions of feature samples are illustrated in Figure 6. Figure 6(a) gives the 2D representations of raw features without feature fusion. Figure 6(b)–(e) shows the 2D representations of the encoded features given by four hidden layers ( SAE1, SAE2, SAE3, and SAE4). Figure 6(f) illustrates the distribution of the final output features fused by SAE and MKL. It can be found that the representations of raw features are extreme disorder and cannot be separated directly. The difference between the features of different WDDs is not obvious. The distribution of the features extracted by the encoder is more organized, and the features of the same WDD condition cluster closely. Finally, all the output features of different WDD conditions are separated clearly. This good separation of features means a more accurate performance in fault diagnosis of the WDD. The results reveal that the proposed method has a strong capacity of learning discriminative representations from the input data. The confusion matrix of the proposed method is given in Figure 7. As can be seen, the accuracy based on the multi-sensor information fusion method is 97.3%. These results show that the proposed method can capture the representative features and distinguish the WDD. Two-dimensional features visualization: (a) raw feature, (b) SAE1, (c) SAE2, (d) SAE3, (e) SAE4, and (f) output features. SAE: stacked autoencoder. Confusion matrix based on simulation data.

4.2. Experimental analysis
Three freight wagons with different WDDs (0.8 mm, 2.1 mm, and 4 mm) are measured, and the acceleration sensors are mounted on their axle-boxes, as shown in Figure 8. The lateral and vertical axle-box acceleration is measured, and the sampling frequency is 2000 Hz. During the process of the experiment, the freight wagons are accelerated to 75 km/h and run at a constant speed and then slowed down to 0 km/h. There are total 7000 samples extracted from the acceleration signals for each wagon, including 800 samples for acceleration and 900 samples for deceleration. Field measurement: (a) location of sensors and (b) speed.
For the wagons with different WDDs, the raw acceleration signals of the axle-box are used to calculate and extract statistical features as mentioned above (The distribution of features of different WDDs can be found in Supplementary Figure S3 in the supplementary information file). It can be found that there is a significant difference in the distribution of features. Some features, such as peak–peak value (
4.2.1. Effect of operation speed
To analyze the effect of operation speed, the features extracted at different operation speeds are employed to recognize the WDD condition of freight wagons. For the whole process of the experiment, a span of 500 samples is input into the model at each time. The diagnostic performance based on multi-sensor fusion is compared with that based on a single sensor, as shown in Figure 9. The results indicate that the classification error of the samples at the constant speed condition is less than that at the acceleration or deceleration condition. The minimum classification error is 7.61% which is given by multi-sensor fusion. It should be noticed that the features extracted from the vibration signals are influenced not only by operation speed but also by complicated track conditions. Consequently, there are some differences in the classification errors for the samples collected at the same operation speed. The samples in the range of 2000–3000 that give the best classification performance are selected for further study. Influence of the speed on the wheel diameter difference diagnosis.
4.2.2. Sensitive feature analysis
In general, a better classification performance and computational efficiency can be obtained by discarding irrelevant features. Figure 10(a) gives the importance of features which are evaluated by the ReliefF algorithm (Huang, 2012). Figure 10(b) shows the performance evolution with different features. The proposed multi-sensor fusion fault diagnosis method gives the classification accuracy of 95.15% with eight ranked features. The classification accuracy of the model using information of a single sensor is 93.37% (18 ranked features) and 91.74% (19 ranked features), respectively. The results reveal that the fault diagnosis based on the information of a single sensor is limited for recognizing the WDD condition. The proposed method can offer better performance with fewer features, which greatly improves the computational efficiency. Influence of features on the performance: (a) feature importance and (b) performance evolution.
4.2.3. Performance comparison
To prove the superiority of the proposed method, several related methods are applied for comparison, including convolutional neural networks (CNNs), SAE, MKL-SVM, and LibSVM. It should be noticed that, at present, the research focused on the diagnosis and recognition of the wheel diameter difference is relatively less. These methods are designed and implemented by ourselves, and the main parameters are set as consistent as possible to ensure the reliability of comparison. The parameters are described as follows: 1. CNN: The network structure of the CNN includes one convolutional layer, one pooling layer, one fully connected layer, and one softmax layer. The structure is selected as 27-20-10-3 which is consistent with the proposed method. The iteration number, momentum, and learning rate are 800, 0.6, and 0.15, respectively. 2. SAE: The network structure parameters are 27-20-10-3. The learning rate and momentum for each AE are 0.3 and 0.1, respectively. The training iteration number is 80 and fine-tuning iteration is 50. 3. MKL-SVM: All datasets are represented by the Gaussian kernel. Taking advantage of the grid search method, the weight parameter is optimized in the span of [0, 1]. The penalty constant and kernel parameters are optimized in the span of [0, 100]. 4. LibSVM: The Gaussian kernel is adopted. The radius of the kernel function and the penalty factor is optimized by the grid search method in the span of [0, 10].
For each method, the fault diagnosis trial is repeated 10 times, and the trial on the dataset with a sample size of 1000 is performed. In each trial, 90% of all samples are randomly selected to train the model and the remaining 10% samples are used to test. The detailed classification performance of several methods mentioned above is illustrated in Figure 11 and listed in Table 4. For all methods, the fault diagnosis based on information fusion of multi-sensors gives the best performance. In other words, it is lacking and lopsided to diagnose the WDD condition based on the information of a single sensor. Compared with other methods, the proposed method fusing the information of axle-box lateral and vertical acceleration produces the highest level of diagnosis accuracy (95.2% ± 2.8%). Therefore, the proposed multi-sensor fusion method can improve the performance of fault diagnosis for the WDD. Comparison of accuracy of various methods. Average accuracy and standard deviation of various methods over 10 trials. It can be found that the proposed method gives the better performance (bold values) based on a single sensor or multi-sensor.
To prove the superiority of the proposed method, the classification accuracies of different methods with various numbers of training samples are evaluated. Figure 12 illustrates the fault diagnosis accuracy for the datasets in different sample sizes. For all comparative methods, the results show that the diagnosis accuracy tends to decrease as the sample size decreases. The main reason is that the finite training samples cannot provide enough fault information to train the classifier. It can be found that the proposed method performs better than other methods for different training sample sizes. Moreover, for small training sample size, the superiority of the proposed method is more significant than other methods. The area under the curve (AUC) of the proposed method can achieve 0.961, 0.962, and 0.992 for the WDDs of 0.8 mm, 2.1 mm, and 4 mm, respectively (The ROC curves of different methods can be found in Supplementary Figure S4 in the supplementary information file.). It can be concluded that the proposed method can effectively reduce the false positives. Performance of different methods with different training sample sizes.
5. Conclusions
Aiming at the WDD of railway freight wagons, a novel method of fault diagnosis is proposed in this paper, which can fuse the information of multiple sensors and improve the recognition accuracy. The effectiveness of the proposed method is verified by the simulation and experimental datasets. The main conclusions are drawn as follows: 1. A multi-sensor fusion fault diagnosis method of the WDD is proposed based on SAE and MKL-SVM. Compared with other well-used intelligent methods, the proposed method has a better feature learning ability and classification performance. 2. The sensitive features for the WDD diagnosis are analyzed. The recognition accuracy based on multi-sensor information fusion offers better performance with fewer features than that based on a single sensor, which greatly enhances the computational efficiency. 3. The diagnosis accuracy decreases when the size of the training samples decreases for all methods. The proposed method gives a better performance than other methods for different training sample sizes, and the superiority of the proposed method is more significant for small training sample sizes.
Supplemental Material
Supplemental Material - Fault diagnosis method of wheel diameter difference based on stacked autoencoder and multiple kernel learning
Supplementary Material for Fault diagnosis method of wheel diameter difference based on stacked autoencoder and multiple kernel learning by Shunqi Sui, Kaiyun Wang, Liang Ling, Shiqian Chen, and Bo Xie in Journal of Vibration and Control.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China (Grant No. 51825504, 51735012) and the Sichuan Science and Technology Program (Grant No. 2020YJ0213).
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
