Hybrid feature selection method for SVM classification and its application for fault diagnosis of wear and peeling in journal bearing with a little muddy water using long-term real data

Abstract

Rotating machines are widely used as components of various industries around the world, and its normal operation of rotating machines is important. Thus, condition monitoring and fault diagnosis of rotating machines have considerable attention in recent years. Industrial statistics illustrate that 40% of total large machine breakdowns happened due to broken bearings, while for small machines, the analogous number reaches up to 90%. This study aimed at researching fault diagnosis of journal bearings using the support vector machine (SVM) method. The experimental systems of vertical and horizontal rotating shafts were developed. There was no adding any initial artificial failure in the bearing, and a little muddy water was used, and long-term vibration data in both systems were obtained in the normal operation of the machines until bearing damages occurred in the journal bearing (3-hour tests were conducted repeatedly and total 128 datasets for vertical shaft and 24 datasets for horizontal shaft were obtained). A feature selection method is focused, and a hybrid feature selection method by combining Fisher score (FS) and a sequential forward selection (SFS) method was proposed. Its accuracy and efficiency was proved experimentally with 97.14% and 100% for vertical shaft and 100% for horizontal shaft. Furthermore, as a result of the SVM model and hybrid feature selection method, the most important feature for journal bearing of horizontal rotor system was clarified as mean value of RMS, and only this feature can give good diagnosis result. It is useful suggestion in selecting the features for the fault diagnosis of horizontal rotating machines.

Keywords

Rotating machinery journal bearing wear peeling diagnosis machine learning feature selection method

Introduction

Rotating machines are widely used as components of various industrial and agricultural plants around the world and are indispensable to our lives. Therefore, its normal operation of rotating machines is important, and any fault of the rotating machinery possibly causes a breakdown of the entire mechanical system, which may reduce the reliability, security, and availability of the machinery. Thus, condition monitoring and fault diagnosis of rotating machines have considerable attention in recent years. Rotating machines generally operate under tough working environment and is therefore frequently subject to faults.¹ In order to improve the reliability and availability of rotating machines, research on machine failure prediction mainly using machine learning is being actively pursued. Industrial statistics illustrate that 40% of total large machine breakdowns happened due to broken bearings, while for small machines, the analogous number reaches up to 90%.² Therefore, intelligent monitoring and fault diagnosis methods for bearings have considerable attention from researchers in recent years.

There are many machine learning methods can be used for intelligent fault diagnosis, such as artificial neural network (ANN), support vector machine (SVM), and so on. ANN is known as a typical machine learning; Al-Raheem et al.³ presented a new technique for an automated detection and diagnosis of rolling bearing faults which using ANN classifier optimized by a genetic algorithm in 2008. In addition, Kanai et al.⁴ proposed a system for model-based estimation (MBE) for deep groove ball bearings and condition-based monitoring (CBM) using ANN in 2016. In 2006, Hinton et al. proposed an autoencoder, which is a dimensional compression algorithm, and it became possible to capture the features of the neural network itself. Deep learning using autoencoders and multi-layer neural networks has emerged, causing a third boom in machine learning. This trend also affected failure diagnosis, and deep learning was also applied to failure diagnosis. In 2021, Gecgel et al.⁵ classified hydrodynamic journal bearing wear using deep learning algorithms.

The two machine learning methods introduced earlier have the problem that the processing process is black-boxed and cannot be explained. In general, huge data and lots of computing time are required to get good performance in these two methods. Therefore, this study focuses on the SVM, which is a learning model with the best recognition performance even in the case of small sample size among many currently known methods. Since it has high recognition performance and is not easily affected by noise data, many failure diagnosis methods using SVM have been developed. In 2017, Unal et al.⁶ used Hilbert transform (HT) and power spectral density (PSD) for rolling bearings to extract features from the original audio signal and SVM for classification. Furthermore, Salunkhe et al.⁷ proposed a method using a SVM and a mathematical model for early failure detection of damper rolling bearings in 2020.

It is worth noting that failures in the research studies above are artificially created. Sometimes, the actual failure in the real machine is not like these artificially created failures, and the resulting vibration characteristics may be significantly different. Therefore, in our research, we demonstrated the long-term rotating test for the rotating shaft system (both vertical and horizontal shafts) supported by the journal bearing with a little muddy water, and collected all the operation data of shaft vibration from normal stage to the damaged stage. Therefore, the failures obtained in this study were caused by natural operation without any artificially created failure, and the data will be closer to the real situation.

Another point to be noted is appropriate feature selection. In the application of the machine learning method, selecting and calculating appropriate features are very essential to the model training. However, for a long time, many researchers have focus on proposing new feature calculation methods for better classification. However, not so many studies on methods selecting features have been investigated, especially the ones for the SVM classification. In this perspective, Yi-Wei Chen et al.⁸ proposed a filter method called Fisher score (FS) to make feature selection in 2006. In 2014, Nannan Gu et al.⁹ improved this Fisher score (FS) method. In the practical application of SVM theory such as fault diagnosis, many researchers applied feature selection methods to save computing time and get better performance. Yan-li Ma et al.¹¹ proposed a novel fault diagnosis method for rotating machinery by extracting fault features with multivariate multiscale fuzzy distribution entropy (MMFDE), selecting sensitive ones with Fisher score (FS), and identifying working state with SVM in 2021.

However, it is not enough only using filter method like Fisher score to select features in most circumstances because it completely ignores the performance of selected features on specific classifiers and evaluates the features individually, which is another point discussed in this study. In 2014, Silvia Cateni et al.¹⁰ proposed a feature selection method combined Fisher score (FS) and exhaustive search. Although it is a good idea to combine filter method and wrapper method, exhaustive search using the wrapper method¹² is so time-consuming and memory-consuming, and because of that, it may not be difficult physically to implement in many practical cases with large sample sizes and many features.

In this study, mainly three original points are investigated. First, it is an application of SVM to fault diagnosis of real long-term vibration data of journal bearing with a little muddy water without adding any initial artificial failure. Two types of rotor systems, horizontal rotor system and vertical rotor system, were investigated. Experiments were carried out, and the data of the bearing from new normal stage to the damaged stage was obtained in the normal rotating condition. The features for training SVM model were calculated, and SVM models were trained using these features. After the training of SVM model, a hyperplane was created based on SVM model to separate data in two stages: normal (before failure) and abnormal (after failure). Then, this hyperplane was used to judge the state of bearings by determine which stage the data belong to.

Second original point is the investigation of feature sensitivity to the SVM model for journal bearing diagnosis by the feature selection method. Sequential forward selection (SFS) is used as a wrapper method to replace exhaustive search method, and a hybrid feature selection method by combining Fisher score (FS) and the sequential forward selection (SFS) method was proposed to select more important features from all the features for SVM models for journal bearing diagnosis. The efficiency and accuracy of this proposed hybrid feature selection method are evaluated.

Finally, the third original point is to identify the most important feature in the vibration signal for the diagnosis of journal bearing is investigated. Both vertical and horizontal rotor systems are investigated and discussed, and the most important feature for the diagnosis of journal bearing in the horizontal shaft is addressed.

Methodology

SVM method and mathematical derivation

SVM method is a classic machine learning method,¹³ and through further research,¹⁴ and the now mature SVM theory was finally developed.

A visualization of the most basic binary classification SVM model is shown in Figure 1. The green line denotes maximum-margin hyperplane, while black dash lines denotes margins, which are for an SVM model trained with samples from two classes. SVM classifier tries to create a hyperplane between two different series of points of the dataset to separate the different classes, and an optimum hyperplane is obtained when a problem solved. The data points (or samples) near the hyperplane explain the margin and are called the support vector of the machine algorithm. The development of SVM and the specific mathematical deduction are introduced in detail in the papers^15–17 which are not repeated here.

Figure 1.

SVM method.

Comparing with other machine learning method, the SVM method has many advantages, such as it can perform well even at a small sample size while costing less computing time. This method always gets optimized solution. In addition, it can solve nonlinear problem and multi-classes cases. On the other hand, SVM classifier is sensitive to missing data because SVMs only rely on a subset (support vectors) of observation,¹⁸ which is a disadvantage of this method. However, there are many ways to deal with this problem.^19,20 Regarding this point, this study will not go into details.

Feature generation

Feature generation is the process of taking raw, unstructured data and defining features (i.e., variables) for potential use in statistical analysis. This study used some common features in two domains: time domain and time-frequency domain. They will be introduced in this section.

Time domain features

Some of time domain techniques are used for condition monitoring, such as root mean square (RMS), skewness, kurtosis, and so on^21,22. These features are shown as examples in the following. Here, $x_{i}$ is the time domain data such as vibration signal, $n$ is total data number, and $\bar{x}$ is the mean value of $x_{i}$ , respectively.

Root mean square (RMS)

R M S = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}

(1)

Skewness

S k e w n e s s = \frac{n}{(n - 1) (n - 2)} \sum_{i = 1}^{n} \frac{{(x_{i} - \bar{x})}^{3}}{s^{3}}

(2)

Kurtosis

K u r t o s i s = \frac{n (n + 1)}{(n - 1) (n - 2) (n - 3)} \sum_{i = 1}^{n} \frac{{(x_{i} - \bar{x})}^{4}}{s^{4}} - \frac{3 {(n + 1)}^{2}}{(n - 2) (n - 3)}

(3)

Average

a v e = \frac{1}{n} \sum_{i = 1}^{n} x_{i}

(4)

Coefficient of variation (C.V.)

C . V . = \frac{s}{a v e}

(5)

where

s

represents standard deviation (STD).

In this research, 6 kinds of features in time domain are used: the mean value of RMS, kurtosis and skewness, and coefficient of variation (C.V.) of the mean value of RMS, kurtosis, and skewness. They are referred as: RMS average, kurtosis average, skewness average, RMS average C.V., kurtosis average C.V., and skewness average C.V.

Every file contains data of 30 s. First, features (RMS, kurtosis, and skewness) of data for each 0.5s are calculated and obtained 600 datasets. Then, mean value and its C.V. of these 600 datasets is calculated to obtain “average” and “C.V.”

Time-frequency domain features

A discrete wavelet transform (DWT) is a transform that decomposes a given signal into number of sets, where each set is a time series of coefficients describing the time evolution of the signal in the corresponding frequency band.²³ The wavelet function used was Daubechies (N = 20), and one-dimensional wavelet decomposition from level 1 to 4 was performed. Its mathematical formulas are shown below:

After expanding the function $f_{j} (x)$ with the scaling function $ϕ$

f_{j} (x) = \sum_{k} c_{k}^{(j)} 2^{j / 2} ϕ (2^{j} x - k) = \sum_{k} c_{k}^{(j)} ϕ_{j, k} (x)

(6)

Using the expansion to the wavelet function

ψ

below

g_{j} (x) = \sum_{k} d_{k}^{(j)} 2^{j / 2} ψ (2^{j} x - k) = \sum_{k} d_{k}^{(j)} ψ_{j, k} (x)

(7)

The method of expanding the function

f_{j} (x)

into wavelet functions with different resolutions (levels) is called multi-resolution analysis.

f_{j} (x) = g_{j - 1} (x) + g_{j - 2} (x) + \dots + g_{j - m} (x) + f_{j - m} (x)

(8)

The following set of functions is used as the basis function.

{ϕ_{j - m, k} (t), ψ_{j - m, k} (t), \dots ψ_{j - 1, k} (t) | k \in Z}

(9)

Here, $f_{j} (x)$ is the original function that performs multi-resolution analysis, $g_{j} (x)$ is the level-specific component of $f_{j} (x)$ , the subscript $j$ is a parameter (integer) that represents enlargement/reduction every double, $m$ is a natural number that represents the level, and $k$ is the way to select. It is a natural number determined by the parameter function $ψ$ .

Feature selection method

In machine learning and statistics, feature selection is the process of selecting a subset of relevant features for use in model construction. Feature selection techniques are often used to avoid the curse of dimensionality and get shorter training times. In addition, it can improve the performance of the model in some cases. In recent years, it is proved that feature selection can improve data’s compatibility with a learning model class²⁴ and encode inherent symmetries present in the input space.²⁵

The feature selection techniques in machine learning can be broadly classified into the following categories: supervised techniques and unsupervised techniques. Supervised techniques can be used for labeled data, and are used to identify the relevant features for increasing the efficiency of supervised models like classification and regression. Unsupervised techniques are usually used for unlabeled data. As this paper uses SVM method, which is a supervised method, supervised techniques of feature selection are used. In supervised techniques, this study proposes the hybrid technique of filter method and wrapper method.

Filter method (Fisher score)

In filter methods, the selection of features is independent of the classifier used. They rely on the general characteristics of the training data to select features with independence of any predictor.²⁶

Filter methods always rank scores of features by using a proxy measure instead of the error rate. For example, the Fisher score method: distances between data points of every feature are measured. In Fisher score method, scores of features are ranked according to this criterion: the feature is given higher score if distances between data point indifferent classes are larger, while the distances between data points in the same class are smaller. The equation of Fisher score method⁸ is shown as below

F (i) = \frac{{({\bar{x}}_{i}^{(+)} - {\bar{x}}_{i})}^{2} + {({\bar{x}}_{i}^{(-)} - {\bar{x}}_{i})}^{2}}{\frac{1}{n_{+} - 1} \sum_{k = 1}^{n +} {(x_{k, i}^{(+)} - {\bar{x}}_{i}^{(+)})}^{2} + \frac{1}{n_{-} - 1} \sum_{k = 1}^{n -} {(x_{k, i}^{(-)} - {\bar{x}}_{i}^{(-)})}^{2}}

(10)

In this equation, ${\bar{x}}_{i}$ , ${\bar{x}}_{i}^{(+)}$ , ${\bar{x}}_{i}^{(-)}$ are the average of the ith feature of the whole, positive, and negative datasets, respectively. $k$ means the $k$ th sample of $i$ th feature. The numerator indicates the discrimination between the positive and negative sets, while the denominator indicates the one within each of the two sets. The larger the Fisher score is, the more likely this feature is more discriminative.

Wrapper method (sequential forward selection (SFS) method)

Wrapper methods work by evaluating a subset of features using a machine learning algorithm that employs a search strategy to look through the space of possible feature subsets, evaluating each subset based on the quality of the performance of a given algorithm. Wrapper models involve optimizing a predictor as part of the selection process. They tend to give better results.²⁶ Wrapper methods use a predictive model to score feature subsets. Each new subset is used to train a model, which is tested on a hold-out set. Counting the number of mistakes made on that hold-out set (the error rate of the model) gives the score for that subset. Filter methods are usually computationally less expensive than wrappers.

In this study, sequential forward selection (SFS) is used as a wrapper method. This SFS method is simple and effective. It always starts from the empty set, sequentially add the feature $x^{+} $ that results in the highest objective function $J (Y_{k} + x^{+}) $ when combined with the features $Y_{k}$ that have already been selected.

Proposed hybrid feature selection method

A proposed hybrid feature selection method is the combination of Fisher score (FS) and sequential forward selection (SFS) method, and it aims to select more important features from all the features for SVM models for journal bearing diagnosis. Procedures of diagnosis by SVM method and proposed hybrid feature selection method are shown in Figure 2 and Figure 3. Here are the steps in detail:

Figure 2.

Diagnosis procedure using SVM method.

Figure 3.

Procedure of proposed feature selection method.

(a) Get raw data from experiments.

(b) Feature generation (calculating features including features in time domain, frequency domain…) and put labels according to the experimental records.

(c) Feature selection: first using Fisher score method to delete the irrelevant features, then using SFS method to delete redundant features.

(d) Dividing data into training dataset and test dataset.

(e) Model training: inputting training dataset and corresponded labels to train SVM model (including kernel function selection).

(f) Model prediction: use the test dataset and trained model to make prediction.

(g) Evaluation: calculate the accuracy of the SVM model.

Experiment and data description

Test rigs and journal bearing description

The experimental equipment and experimental data of journal bearing with a little muddy water used in this study are explained in this section. Two experimental test rigs are used: the vertical and the horizontal rotor systems. These are shown in Figures 4–8. The parameter values of the journal bearings and test rigs parameters are shown in Table 1. Both systems were periodically stopped, and the damage was checked.

Figure 4.

Vertical rotor system.

Figure 5.

Testrig of vertical rotor system.

Figure 6.

Journal bearing part of vertical shaft.

Figure 7.

Horizontal rotor system setup.

Figure 8.

Journal bearing part of horizontal shaft.

Table 1.

Part of the experiment data record.

	Bearing no.	Bearing inner diameter D[mm]	Bearing length L[mm]	Design clearance [mm]	Bearing material	Sleeve material
Vertical system	1	65	60	0.255	SUS316	Carbon
Horizontal system	1	65	150	0.288	Resin	SUS304
	2	65	150	0.288	Resin	SUS304

Figures 4 and 5 show the test rig of vertical rotor system. The shaft was driven by the motor attached at the upper end, and the lower end of the shaft was supported by the journal bearing. Figure 6 shows the shaft part and bearing part of the journal bearing. In the vertical rotor system experiment, the journal bearing has no obvious physical damage even after the long-term experiment with a little muddy water. Therefore, the obtained data was for only one journal bearing.

Figure 7 shows the test rig of horizontal rotor system. The shaft was driven by the motor attached at the one end of the shaft, and the other end of the shaft was supported by the journal bearing. Figure 8 shows the shaft part and bearing part of the journal bearing of horizontal shaft. In the horizontal axis experiment with a little muddy water, the vibration gradually changed, and the journal bearing was replaced due to wear and peeling. Therefore, data for two journal bearings were obtained.

Vertical rotor system

Horizontal rotor system

Experimental data statement

Vertical rotor system

Rotation speed ω: 1358 rpm (22 Hz).

Materials: Bearing SUS316, sleeve carbon.

Sampling: 3000Hz.

Sampling is performed every 36 s, each sampling time is 30 s, and the sampling frequency is 3kHz. Every file contains 30s’ data. The vertical rotor system experiment is carried out for 3 hours a day. In order to investigate the running condition effect, the data are classified in three groups: (1) at startup, (2) 1.5 h after the start of operation, and (3) before the stop of operation (3 h after the start of operation) and numbered them.

This 3h test was conducted for total 57 days (about 171h in total). However, the operating condition was still normal and the journal bearing is in good condition visually in periodic check. Figure 9 shows the signal of journal bearing temperature and torque. Some changes happened with the journal bearing temperature and the torque between 40h∼50 h (No.40–48) and after 100 hs’ running (No.89–128).

Figure 9.

Change of journal bearing temperature and torque in vertical rotor system.

Training dataset for SVM modeling were data from No.1 to No.128 (1∼143h). Then, trained SVM model was tested. The test datasets are divided into two parts and numbered:

selected randomly from rest files collected in this period: 1h∼143h, No.1∼No.70

selected randomly from files collected in this period: 143h∼171h, No.1∼No.26

Horizontal rotor system

Rotation speed ω: 500 rpm (8.3 Hz).

Material: Bearing = Resin, shaft sleeve = SUS304.

Sampling: 1000Hz, 30s.

Except the sampling frequency, the data collection method of the horizontal system was similar to that of vertical system. During the use of the journal bearing 2, significant peeling was found after the data of No.15, which was the first periodic check. Figure 10 shows the change of journal bearing temperature, and it also indicates the change at around the data of No.15.

Figure 10.

Change of temperature in horizontal rotor system.

In the modeling of the SVM, data No. 1 to 15 were used as the data in normal condition and data No. 20 to 24 were used as warning condition. Then, the trained SVM model was tested using both the data No. 16 to 24 which was after peeling (labeled as warning data) and the data No. 1 to 15 which was before peeling (labeled as normal data).

Analysis of feature signals

Calculation of feature signals

Features in time domain described in Feature generation are used in the analysis of journal bearing experiment data. All 20 features used for training SVM model and analysis are shown in Table 2. In time domain, there are 12 features: 6 features for both

x

and

y

directions. Figure 11 and Figure 12 show these features for both vertical and horizontal rotor systems. In time-frequency domain, there are 8 features: 4 levels of DWT for both

x

and

y

directions. Figures 13 and 14 show these features for both vertical and horizontal rotor systems. However, it is difficult to diagnosis the occurrence of wear or peeling of journal bearing directly from these 20 features. Then, in the next section, these feature signals are leveled and SVM model is trained for both vertical and horizontal systems.

Table 2.

Features used for SVM model of journal bearing data.

Feature number	Feature name
Feature no. 1	RMS_Ave_X
Feature no. 2	RMS_CV_X
Feature no. 3	kurtosis_Ave_X
Feature no. 4	kurtosis_CV_X
Feature no. 5	skewness_Ave_X
Feature no. 6	skewness_CV_X
Feature no. 7	RMS_Ave_Y
Feature no. 8	RMS_CV_Y
Feature no. 9	kurtosis_Ave_Y
Feature no. 10	kurtosis_CV_Y
Feature no. 11	skewness_Ave_Y
Feature no. 12	skewness_CV_Y
Feature no. 13	D1_X
Feature no. 14	D2_X
Feature no. 15	D3_X
Feature no. 16	D4_X
Feature no. 17	D1_Y
Feature no. 18	D2_Y
Feature no. 19	D3_Y
Feature no. 20	D4_Y

Figure 11.

Features of vertical rotor system in time domain.

Figure 12.

Features of horizontal rotor system in time domain.

Figure 13.

Features of vertical rotor system in time-frequency domain (a) DWT transformation in x direction (Lv1∼Lv4) and (b) DWT transformation in y direction (Lv1∼Lv4).

Figure 14.

Features of horizontal rotor system in time-frequency domain. (a) DWT transformation in x direction (Lv1~Lv4) and (b) DWT transformation in y direction (Lv1~Lv4).

Label the samples

Vertical rotor system

According to Experimental data statement and Figure 10, labels are defined based on the changes of journal bearing temperature. Selected data (Training dataset: No.1∼No.128, Test dataset: Exp No.1∼No.70 and Exp.No.1∼No.26) can be divided into states as shown in Table 3:

Table 3.

Journal bearing data for training and test (vertical rotor system).

Training dataset
Normal state	No.1–39 and No.49–88	Label 1
Warning state	No.40–48 and No.89–128	Label 0
Test dataset
Normal state	No.1∼No.37	Label 1
Warning state	No.38∼No.70	Label 0
Normal state	Test Part 2: No.1∼No.26	Label 1

Horizontal rotor system

According to Experimental data statement and Figure 11, labels are defined based on the changes of journal bearing temperature and periodic check of journal bearing (peeling after Exp No.15).

Selected data (Exp No.1∼No.24) can be divided into two states as shown in Table 4.

Table 4.

Journal bearing data for training and test (horizontal system).

Training dataset
Normal state	No.1∼No.5	Label 1
Warning state	No.16∼No.20	Label 0
Test dataset
Normal state	No.1∼No.15	Label 1
Warning state	No.16∼No.24	Label 0

Fault diagnosis result and discussion

Kernel functions

This section shows the result of fault diagnosis of the journal bearing of vertical and horizontal rotor systems, respectively. Figures 15 and 16 show 3D plots of SVM models for both vertical and horizontal systems using the trained SVM model. Three different kernel functions are used and prediction accuracy was compared. In these figures, all 20 features are used in the training and testing of SVM model. Both figures, for vertical and horizontal rotor systems, show that: figures a, b, and c indicate the hyperplane of support vector successfully distinguish the normal and warning conditions. As a result, Figures d, e, and f show that the prediction result significantly agreed with the correct label of data. Particularly, Figure 16(f) shows good agreement of prediction label and correct label. Table 5 shows a summary of these results for both vertical and horizontal rotor systems shown in Figure 15 and Figure 16. It proves the polynomial kernel function is the most appropriate one for SVM model of journal bearing in both vertical and horizontal systems.

Figure 15.

3D plot and prediction results of SVM model - vertical rotor system. (a) visualization using linear kernel function, (b) visualization using RBF kernel function, (c) visualization using RBF kernel function, (d) prediction result using polynomial kernel function, (e) prediction result using linear kernel function, and (f) prediction result using RBF kernel function.

Figure 16.

3D plot and prediction results of SVM model - horizontal rotor system. (a) visualization using linear kernel function, (b) visualization using RBF kernel function, (c) visualization using RBF kernel function, (d) prediction result using polynomial kernel function, (e) prediction result using linear kernel function, and (f) prediction result using RBF kernel function.

Table 5.

Accuracy of SVM model using different kernel function.

kernel function	Vertical system	Horizontal system
Polynomial	97.14%	100.00%
Linear	91.43%	83.33%
RBF	92.86%	83.33%

Discussion on the feature selection using hybrid method

Finally, using the most appropriate kernel function, prediction accuracy of the SVM model using different features is shown. The proposed hybrid algorithm is used to select features of SVM model for journal bearing. Filter method (Fisher score) is applied, and the order of the features according to their importance is obtained. Table 6 shows the results of the order of the features according to their importance based on filter method (Fisher score). In this table, the left is the ranking result of vertical system’s features, and the right is result of horizontal systems’ features, respectively. Both are obtained by the Fisher score method. Table 6 indicates that the features for vertical rotor system relatively show many high Fisher score, which indicates the difficulty of focusing to some features only.

Table 6.

Comparison of vertical and horizontal systems’ results after FS method.

Then, the features whose values in Table 6 are less than 1% are removed to delete the irrelevant features, and the wrapper method (SFS method) is applied to the retained features. Table 7 shows the result (optimal feature subset) obtained by the hybrid method of Fisher score method and SFS method. It shows 3 and 1 optimal feature subset for vertical and horizontal systems.

Table 7.

Optimal subset using hybrid method of both Fisher score method and SFS method.

	(a) Vertical system
No.3	No.7	No.12
Kurtosis average _X	RMS average _Y	Skewness CV_Y
	(b) Horizontal system
	No.1RMS average _X

Note that the optimal feature sets shown in Table 7 for the case of vertical rotor system are not highest ones in the ranking result of FS method shown in Table 6. This is the appearance of the unsolved drawbacks of filter model (Fisher score), which is that it completely ignores the performance of selected features on specific classifiers and evaluates the features individually. However, wrapper models involve optimizing a predictor as part of the selection process which tends to give better results. The optimal feature subset should depend on the specific biases and heuristics of the classifier such as wrapper method. It explains the difference between the results of FS (filter method) and SFS (wrapper method) are different, and the proposed hybrid method using their advantages.

Figures 17 and 18 show the validity of the optimal feature subset chosen by proposed hybrid method for vertical system. In either cases, the Figure 17(d) and Figure 18(d), which use all optimal features, show the best diagnosis result. Figure 19 shows the validity of the optimal feature chosen by hybrid method for horizontal system. In this case, Figure 19(a), which use optimal feature, shows the best diagnosis result, and it is the same as the case using all features shown in Figure 19(c).

Figure 17.

Prediction result of vertical rotor system (using polynomial kernel) - Test part 1. (a) using 1 feature: feature No.3, (b) using 2 features (No.3, 12), (c) using 2 features (No.3, 7), and (d) using 3 features (No.3, 7, 12).

Figure 18.

Prediction result of vertical rotor system (using polynomial kernel) - Test part 2. (a) using features randomly: No. 2,10,15, (b) using feature No. 12, (c) using feature No. 7, and (d) using selected features (No. 3, 7, 12).

Figure 19.

Prediction result of horizontal rotor system (using polynomial kernel). (a) using 1 feature: feature No. 1, (b) using features except No. 1, (c) using 20 features, and (d) using 1 feature No. 19.

Performances of SVM model with different features in cases of vertical and horizontal rotor systems are summarized in Table 8 and Table 9, respectively. As shown in the tables, performances with features selected by proposed hybrid method show better than the other cases, which prove the effectiveness of the proposed hybrid feature selection method.

Table 8.

Performances of SVM model with different features (vertical system).

	Vertical rotor system
	Feature number in Table 2	Accuracy for test 1	Accuracy for test 2
All features	All (20 features)	97.14%	100.00%
Optimal subset by hybrid method of FS and SFS	No.3.7.12	97.14%	100.00%
Part of optimal subset by hybrid method of FS and SFS	No.7.12	95.71%	100.00%
	No.3.7	88.57%	61.54%
	No.3.12	87.14%	53.85%
	No.3	68.57%	0.00%
Top features by FS	No.2.8.9	85.00%	100.00%
Worse features (lower score) by FS	No.14.18.6	52.86%	0.00%
Select randomly	No.4.10.15	52.86%	0.00%
Select randomly	No.5.6.8.14.19	52.86%	0.00%

Table 9.

Performances of SVM model with different features (horizontal system).

	Horizontal rotor system
	Feature number	Accuracy
All features	All (20)	100.00%
Optimal subset by hybrid method of FS and SFS	No.1	100.00%
Top features by FS	No.1.5.17	100.00%
Select randomly	No.19	75.00%
	No.7.19	50.00%
	19 features (not include feature No.1)	75.00%
Worse features (lower score) by FS	No.7.14.18	50.00%

According to Table 6 and Table 7, when the weight of a feature is much higher than others like the case of horizontal system, optimal subset may be obtained by only performing the filter method. However, when the weights of certain features are largest and in same order, the results will be better by using the proposed hybrid feature selection method. When the results obtained by the filter method and results obtained by the wrapper method are inconsistent, wrapper models tend to give better results because it uses target function as a part of the selection process.

In addition, Table 8 and Table 9 show performances using all features are also good. Some papers illustrate that considering multiple features the accuracy of the classifier should be better because it increases the information available, but it is not always true. In some cases, performances of SVM model with optimally selected features will be better. Furthermore, even if the performances of using all features and using optimally selected features are the same, the proposed method can reduce the number of features, which decreases the computing time, memory, and the complexity of the model.

Last, this result indicates the most important feature for the journal bearing of horizontal rotor system is the mean value of RMS for horizontal direction. Its importance is very high, and the diagnosis may be possible by using this feature only. On the other hand, for the journal bearing of vertical rotor system, RMS average coefficient of variation is most important, and then mean value and average coefficient of variation for kurtosis and skewness are also important. However, their importance is not dominant, and it is difficult to choose a few features to perform the diagnosis with high accuracy.

Conclusion

In this research, long-term experiment was performed for both vertical and horizontal rotor systems supported by the journal bearing with a little muddy water, and experiment data of journal bearings are obtained and analyzed. There is no initial artificial failure added to the system, and the journal bearing failure was developed in the normal rotating condition. The support vector machine (SVM) method is applied for fault diagnosis and feature selection method was discussed. The most important feature for the journal bearing was also discussed. The following conclusions are obtained:

(1) The long-term experimental data including some failure symptom was obtained successfully for both vertical and horizontal rotor systems supported by the journal bearing with a little muddy water without no initial artificial failure added to the system, and the journal bearing failure was developed in the normal rotating condition.

(2) Choosing the appropriate kernel function is important for SVM model training. The accuracies of the SVM models are higher than 97% under the condition of selecting suitable kernel function and features.

(3) A hybrid feature selection method combining filter method (Fisher score) and wrapper method (sequential forward selection) is proposed. The set of features are firstly reduced using a filter selection method, and then sequential forward selection is performed in order to obtain optimal subset of features in a reasonable time. The filter method such as Fisher score is efficient only in some cases, whereas the proposed feature selection method tends to give better results. The proposed feature selection method can decrease the computing time, memory, and the complexity of the model by reducing the number of features.

(4) For the journal bearing of horizontal rotor system, the most important feature is the mean value of RMS for horizontal direction. Its importance is very high, and the diagnosis may be possible by using this feature only. On the other hand, for the journal bearing of vertical rotor system, RMS average coefficient of variation is most important, and then mean value and average coefficient of variation for kurtosis and skewness are also important. However, their importance is not dominant, and it is difficult to choose a few features to perform the diagnosis with high accuracy.

ORCID iDs

Tsuyoshi Inoue https://orcid.org/0000-0002-6913-9196

Shota Yabui https://orcid.org/0000-0001-5535-9473

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Lei

. Intelligent Fault Diagnosis and Remaining Useful Life Prediction of Rotating Machinery. Xi’An, China: Xi’an Jiaotong University Press Co. Published by Elsevier Inc, 2017.

Lau

ECC

Ngan

. Detection of Motor Bearing Outer Raceway Defect by Wavelet Packet Transformed Motor Current Signature Analysis. New York, NY: IEEE, 2010.

Al-Raheem

Roy

Ramachandran

, et al. Application of the laplace-wavelet combined with ANN for rolling bearing fault diagnosis. J Vibration Acoust 2008; 130(5): 051007(1)–051007(9).

Kanai

Desavale

Chavan

. Experimental-based fault diagnosis of rolling bearings using artificial neural network. J Tribology 2016; 138(3): 031103.

Gecgel

Dias

Ekwaro-Osire

, et al. Simulation-driven deep learning approach for wear diagnostics in hydrodynamic journal bearings. J Tribology 2021; 143(8): 084501.

Unal

Sahin

Onat

, et al. Fault diagnosis of rolling bearings using data mining techniques and boosting. J Dynamic Syst Meas Control 2017; 139(2): 021003.

Salunkhe

Desavale

Thim

. Experimental frequency-domain vibration based fault diagnosis of roller element bearings using support vector machine. J. Risk Uncertain. Eng. Syst B. 2020; 7(2): 021001.

Chen

Y-W

Lin

C-J

. Combining SVMs with Various Feature Selection Strategies, Feature Extraction Foundations and Applications. Berlin, Germany: Springer, 2006.

NannanGuFan

Ren

. Efficient Sequential Feature Selection Based on Adaptive Eigenspace Model, Neurocomputing. Elsevier, 2015, pp. 925–2312.

10.

Cateni

Colla

Vannucci

, et al. A Hybrid Feature Selection Method for Classification Purposes, 2014 UKSim-AMSS 8th European modelling symposium, 2014.

11.

Cheng

Wang

, et al. Rotating machinery fault diagnosis based on multivariate multiscale fuzzy distribution entropy and fisher score. Measurement 2021; 179(2021): 109495.

12.

Guyon

Elisseeff

. An introduction to variable and feature selection. J Machine Learning Research 2003; 3(7–8): 1157–1182.

13.

Corinna

Vapnik Vladimir

. Support-vector networks. Machine Learn 1995; 20(3): 273–297.

14.

Boser

Guyon

Vapnik

. A training algorithm for optimal margin classifiers. Proc Fifth Annu Workshop Comput Learn Theor - COLT 1992; 92: 144–152.

15.

Anshori

Mar’I

Alauddin

, et al. Prediction result of dota 2 games using improved SVM classifier based on particle swarm optimization. In: 2018 International conference on sustainable information engineering and technology. (SIET), Malang, Indonesia, 10–12 November 2018.

16.

Avriel

. Nonlinear Programming: Analysis and Methods. Dover Publications: Mineola, NY, USA, 2003.

17.

Kavzoglu

Colkesen

. A kernel functions analysis for support vector machines for land cover classification. Int J Appl Earth Observation Geoinformation 2009; 11(5): 352–359.

18.

Stewart

Zeng

. Constructing support vector machines with missing data. Wiley Interdiscip Rev Comput Stat 2018; 10(4): e1430.

19.

Pelckmans

De Brabanter

Suykens

JAK

, et al. Handling missing values in support vector machine classifiers. Neural Networks 2005; 18(5–6): 684–692.

20.

Honghai

Guoshun

Cheng

, et al. A SVM regression based approach to filling in missing values. Knowledge-Based Intell Inf Eng Syst 2005; 3683: 581–587.

21.

Sharma

. Fault Identification in Roller Bearing Using Vibration Signature Analysis. Patiala: Master’s Thesis, Thapar University, 2011.

22.

Kharche

Borole

Kharche

. Fault detection in rolling element bearing using vibration based analysis. International Conference on Ideas, Impact and Innovation in Mechanical Engineering (ICIIIME), 2017.

23.

Hosseinzadeh

. Robust Control Applications in Biomedical Engineering: Control of Depth of Hypnosis, Control Applications for Biomedical Engineering Systems. Elsevier, 2020.

24.

Kratsios

Hyndman

. NEU A meta-algorithm for universal UAP-invariant feature representation. J Machine Learn Res 2021; 22(92): 1–51.

25.

Persello

Bruzzone

. Relevant and invariant feature selection of hyperspectral images for domain generalization. In: IEEE geosci remote sensing symp, Quebec City, QC, 13–18 July 2014.

26.

Roffo

. Feature Selection Library (MATLAB Toolbox). Mathworks: Natick, MA, USA 2018.