Integrative intrinsic time-scale decomposition and hierarchical temporal memory approach to gearbox diagnosis under variable operating conditions

Abstract

Gearbox diagnosis under stationary operating conditions has been extensively investigated; however, variable operating conditions such as load and speed changes play important roles in affecting the accuracy of gearbox diagnosis. This article presents an integrative approach of intrinsic time-scale decomposition and hierarchical temporal memory for gearbox diagnosis under variable operating conditions. A total of two modules are emphasized including a feature extraction method and an integrative feature fusion and classification model. Intrinsic time-scale decomposition method is investigated to extract the gearbox features which are insensitive to variable operating conditions, and its performance overcomes the commonly used empirical mode decomposition in terms of decomposition result and computational efficiency. Hierarchical temporal memory integrates feature fusion and pattern classification in one model to autonomously diagnose gearbox defect. Performance comparison among the presented method, back-propagation neural network, support vector machine, and fuzzy c-means clustering using experimental data demonstrate the effectiveness of the presented method.

Keywords

Hierarchical temporal memory intrinsic time-scale decomposition gearbox diagnosis variable operating conditions

Introduction

As an essential component in virtually all industrial processes, gearbox is widely used to convert speed and torque to maintain machinery normal operation.¹ Gearbox defect may cause failure of whole system, leading to significant economic losses, costly downtime, and even catastrophic damage. Thus, online monitoring and fault diagnosis of gearboxes are of great importance to achieve a high degree of availability, reliability, and operation safety.

Effective signal processing algorithms play important roles in gearbox defect diagnosis and become an active research field.^2–4 Typical gear faults including pitting, chipping, and crack may cause amplitude or phase modulation of vibration signals; thus, advanced signal processing techniques have been investigated for gearbox diagnosis. Vibration analysis techniques including statistical metrics, spectral kurtosis, and envelop analysis are applied to diagnose the presence of naturally developed faults within gearboxes in Elasha et al.⁵ A hybrid technique integrating Hilbert transform and wavelet packet transform is presented to improve time–frequency resolution for gear incipient fault detection in Fan and Zuo.⁶ A sparsity-enabled signal decomposition method based on tunable Q-factor wavelet transform is also investigated for fault feature extraction of gearbox.⁷ To eliminate the selection of scale and base wavelet in wavelet transform, empirical mode decomposition (EMD), as an adaptive signal processing technique, is performed with statistical parameter analysis of vibration and acoustic signals to detect local faults of helical gears.⁸ Another study on fault diagnosis of wind turbine gearbox had solved the nonlinear and nonstationary problem by combining EMD and wavelet transform.⁹ A self-adaptive noise cancellation method was presented in Tian and Qian¹⁰ to eliminate the white noise which effectively improved the accuracy of fault diagnosis. A least mean square–based adaptive filtering scheme is investigated to diagnose tooth breakage with different severities.¹¹ These developed techniques are effective in fault feature extraction. However, the relationships between fault features and gearbox failure modes are not explicit, which causes difficulty in identifying gearbox failure modes.¹²

To address the above issues, artificial intelligence techniques have been investigated to classify gearbox defects by taking the extracted fault features as inputs. In Hajnayeb et al.,¹³ a novel system based on multilayer perceptron artificial neural networks (ANNs) was designed to classify four different conditions of a gearbox using its vibration signals¹⁴ and introduced Daubechies wavelets (db1–db15) for feature extraction of vibration signals produced by a bevel gearbox in various conditions and faults. The J48 algorithm was used for feature selection and classification of various conditions of gearbox. Statistical features, frequency-domain features, and instantaneous energies based on EMD are applied for the gear crack diagnosis with different levels.¹⁵ Time history, spectrum analysis, and fractal dimension are used for classifying the tooth crack and spalling failure of gear system in Ma and Chen.¹⁶ In Liang et al.,¹⁷ the fault symptoms of gear tooth crack are identified and located through analyzing the effect of multiple vibration sources. An integrative approach of ensemble EMD (EEMD) and principal component analysis (PCA) is also reported in Yang and Wu¹⁸ for gearbox defect diagnosis. The above techniques are developed and suitable for gearbox feature extraction and fault diagnosis under stationary operating conditions. However, load change and speed change of gearboxes as well as the impact of external factors exist in practice,¹⁹ so gearboxes are usually operated under variable conditions in some application scenarios, such as wind turbine gearbox. Particular interest in gearbox diagnosis under varying operating conditions also arises since it may highlight a series of transient defect features which enhance gearbox diagnosis.

Many efforts have been put on fault feature extraction of gearbox under variable operating conditions. Dynamic and three-dimensional finite element analytical models of cracked gears are established for gear fault diagnosis under different load conditions in Meltzer and Dien.²⁰ Given that faulty gearbox is more susceptible to load than healthy gearbox, a regression equation describing the slope of vibration meshing component amplitude with respect to instantaneous input speed is selected as gearbox fault feature under varying load conditions.²¹ In Cheng et al.,¹² a three-dimensional finite element model was built to reveal stress intensity factors for surface crack on the spur gear tooth. An envelope order spectrum is developed to illustrate the amplitude-modulated and frequency-modulated features of vibration signals under varying operating speed conditions.^22,23 In Heyns et al.,²⁴ an autoregressive exogenous (ARX)–based model is investigated to obtain a residual vibration signal representing fault feature of gearbox under fluctuating operating conditions. Therefore, different techniques have been investigated for gearbox fault extraction under variable operating conditions. However, the extracted fault features still need to be identified by visual interpretation for gearbox diagnosis.

To advance gearbox defect diagnosis under variable operating conditions, this article presents an integrative approach of intrinsic time-scale decomposition (ITD) and hierarchical temporal memory (HTM) for autonomous gearbox defect diagnosis. ITD is a recently developed adaptive time–frequency analysis technique which is suitable for multi-component amplitude-modulated and frequency-modulated signals in gearbox. The features insensitive to varying operating conditions are extracted from ITD decomposition results as inputs for HTM. HTM is a dynamic pattern classifier modeling human brain activity, which absorbs the features simplified by the ITD method. It takes a hierarchical structure to identify the most representative patterns from features by eliminating the feature fusion step in conventional pattern classifiers (e.g. support vector machine (SVM) and ANN). The integrative approach of ITD and HTM is experimentally demonstrated in yielding higher classification accuracy in gearbox defect diagnosis under variable operating conditions.

The rest of this article is organized as follows: The theoretical background of ITD and HTM is first discussed in section “Theoretical background.” The theoretical framework of the integrative approach of ITD and HTM is then presented in section “Integrative approach of ITD and HTM.” The feature extraction strategy as well as performance comparison between ITD and commonly used EMD is also discussed. Next, the effectiveness of the developed method is demonstrated in the experimental studies on a gearbox testbed. Finally, the conclusions are drawn.

Theoretical background

ITD

As an adaptive time–frequency analysis method, ITD method is first proposed in Frei and Osorio.²⁵ It has been widely used in biomedical signal processing and bearing defect diagnosis.²⁶ Comparing with other adaptive time–frequency analysis methods such as EMD and local mean decomposition (LMD), ITD shows obvious advantages in computational efficiency and frequency resolution for complex and nonstationary signals. First of all, ITD is originally proposed for nonstationary signals that are time varying. The instantaneous frequency and amplitude of proper rotation component (PRCs) can be reserved accurately, and the instantaneous information is obviously imperative for the variable working condition. Second, ITD can effectively control end effect, restricting the defections at the edge of the first and last extreme points. Also, ITD cast out the time-consuming interpolation and screen processions, consequently earning efficiency over EMD. Thus, ITD is more suitable for dynamic analysis and dealing with large quantities of original data. Thus, it is promising for gearbox signal processing.

Instead of iterative envelope extraction in the EMD method, the ITD method adopts linear transformation to adaptively decompose the signal into a series of several PRCs independent of each other. For a signal X_t, ITD utilizes a baseline-extracting operator ξ for signal decomposition. The first decomposition of signal X_t is shown below

X_{t} = ξ X_{t} + (1 - ξ) X_{t} = L_{t} + H_{t}

(1)

where L_t is a decomposed baseline signal and H_t is the PRC. After extracting the baseline from signal X_t, the residual of the signal becomes an inherent rotation component.

Define {τ_k, k = 1, 2,…} as the time of local extrema of the X_t, τ₀ is set as 0, and $L (τ_{0}) = (X (τ_{0}) + X (τ_{1})) / 2$ . In case of the intervals at which X_t is constant, τ_k is chosen as the right endpoint of the intervals. Suppose that L_t and H_t have been defined in [0, τ_k], and the X_t is also available at t ∈ [0, τ_k₊₂], a baseline-extracting operator ξ on the interval (τ_k, τ_k₊₁] between two successive extrema can be defined as follows

ξ X_{t} = L_{t} = L_{τ_{k}} + (\frac{L_{τ_{k + 1}} - L_{τ_{k}}}{X_{τ_{k + 1}} - X_{τ_{k}}}) (X_{t} - X_{τ_{k}})

(2)

Denote $L_{τ_{k}}$ and $X_{τ_{k}}$ as L_k and X_k, respectively, for simplicity, and the above equation is reformulated as

ξ X_{t} = L_{k} + (\frac{L_{k + 1} - L_{k}}{X_{k + 1} - X_{k}}) (X_{t} - X_{k})

(3)

where L_k_+ 1 is supposed as

\begin{matrix} L_{k + 1} = α [X_{k} + (\frac{τ_{k + 1} - τ_{k}}{τ_{k + 2} - τ_{k}}) (X_{k + 2} - X_{k})] \\ + (1 - α) X_{k + 1} \end{matrix}

(4)

where α ∈ (0, 1) and is usually set as 0.5, τ₀ is set as 0, and $L_{0} = (X_{0} + X_{1}) / 2$ .

According to equation (1), a baseline signal and a PRC are obtained in the decomposition process. The first decomposition has access to a baseline signal $L_{t}^{1}$ and a PRC $H_{t}^{1}$ . The term $H_{t}^{1}$ represents higher frequency ingredient of the original signal X_t. Repeat the above procedure using the baseline signal as the new source data for p times until $L_{t}^{p}$ turns into a monotonic function. By means of this method, the signal is decomposed into a monotonic trend component and several PRCs with frequency ranges descending from high to low. The whole process can be expressed as

\begin{matrix} X_{t} = H X_{t} + ξ X_{t} = H X_{t} + (H + ξ) ξ X_{t} \\ = [H (1 + ξ) + ξ^{2}] X_{t} = (H \sum_{k = 0}^{p - 1} ξ^{k} + ξ^{p}) X_{t} \\ = H_{t}^{1} + H_{t}^{2} + H_{t}^{3} + \cdot \cdot \cdot + H_{t}^{p} + L_{t}^{P} \end{matrix}

(5)

where Hξ^kX_t is the k + 1 layer of the PRC and ξ^pX_t is either the monotonic trend or the lowest frequency baseline.

The instantaneous amplitudes and instantaneous frequencies of the PRCs are analyzed in the frequency domain. Through analyzing the spectrum, the amplitude modulation and frequency modulation of the signal can be derived, respectively.²⁵

Hierarchical temporal memory

HTM is a recently developed machine learning technology that aims to capture the structural and algorithmic properties of the neocortex in human brain.²⁷ It has been applied to classify human body acceleration patterns,²⁸ vision-based hand shape,²⁹ remote gaze gesture,³⁰ and sign language.³¹ In comparison with traditional ANNs, HTM not only has better self-adaptability, higher learning efficiency, and lower requirements for the number of samples but also can recognize complicated patterns with strong noise.

Most machine learning techniques are relatively static. The model accuracy highly depends on the quality and quantity of training data. HTM is an online learning system³² which continuously updates the model with new data arriving. HTM is a memory-based system by modeling the neurons as arranged in columns, regions, layers, and a hierarchy structure. Figure 1 shows a simplified HTM diagram arranged in a two-level hierarchy. The inputs to the bottom layer are time varying data, and the recognition results are obtained from the top layer. Each layer is decomposed into different regions, and each region consists of a sheet of highly interconnected cells arranged in columns. Figure 2 describes a small section of an HTM region with four cells per column organized in a two-dimensional array of columns. Each column connects to a subset of the input, and each cell connects to other cells in the region.³³ It simulates the information representation in human brain named sparse distributed representation, which represents a small portion of active neurons within a large population of neurons. As shown in Figure 2, the HTM region creates a sparse distributed representation after receiving an input from its previous level, and the dark neurons represent active cells.

Figure 1.

HTM network arranged in a two-level hierarchy.

Figure 2.

HTM region with a sparse distributed representation.³¹

There are three basic functions in HTM including learning, recognition, and prediction. An HTM region performs learning tasks by finding the sequences of patterns in sensory data. It searches the combinations of input bits that occur often, named spatial patterns. Then, it studies how these spatial patterns appear in sequence over time, which is stored as temporal patterns. After finishing learning tasks, an HTM region can perform pattern recognition on new inputs. When receiving a new input, the region matches it with previously learned spatial and temporal patterns using sparse distributed representations. The great majority of memories in HTMs are used to store the sequences of patterns as well as transitions between spatial patterns. An HTM region can predict what inputs it will likely receive next by means of matching stored sequences with current input.

Based on the representative patterns in the spatial pooler of an HTM, the input numeric eigenvalues are transferred into a spare bit matrix, effectively eliminating the interference of noise. Also, the digits of the bit matrix are comparatively independence, which can be better used for Bayesian classifier. The representative patterns described by bit matrixes are fed into Bayesian classifier as follows:^34,35

1. A prior probability P(C_i) for each working state is calculated which is approximated by the ratio of sample amount N_i of each state and the total amount of samples N as

P (C_{i}) \approx \frac{N_{i}}{N}

(6)

where i is the index of state C_i.

2. For a test sample X, HTM calculates the posterior probability of state to which X belongs

P (C_{i} | X) = \frac{P (C_{i}) P (X | C_{i})}{P (X)}

(7)

P (X | C_{i}) = P [x_{1}, x_{2}, \dots, x_{n} | X \in C_{i}] = Π_{j = 1}^{n} P (x_{j} | C_{i})

(8)

where $P (X | C_{i})$ is the conditional probability of the sample X.

3. The state C_i with the maximum posterior hypothesis is selected as the classification result

C_{i} = \arg \max_{C_{i} \in D} P (C_{i}) \cdot P (X | C_{i})

(9)

Integrative approach of ITD and HTM

The essentials of gearbox diagnosis are to extract representative features insensitive to varying operating conditions and improve the pattern classification accuracy. To enhance gearbox diagnosis under variable operating conditions, this article presents an integrative approach of ITD and HTM, and the details are discussed below.

Formulation of the integrative approach

The framework of the integrative approach of ITD and HTM is shown in Figure 3. The vibration signal measured from a gearbox is first processed by ITD to extract fault feature information which is insensitive to varying operating conditions. The extracted features are then converted into the input sequence of HTM.

Figure 3.

Procedure diagram of fault diagnosis based on HTM.

The training process of HTM in the spatial pooler can be approximately divided into three stages as follows:

Coverage: obtain the bits of each region according to the current input sequence;

Inhibition: calculate active region in which the bit is set as 1 and the others as 0.

Learning: update corresponding coefficients of the HTM model.

With the utilization of the spatial pooler, HTM network obtains corresponding representative patterns from the input sequence, and the representative patterns are fed into Bayesian classifier to autonomously identify gearbox defects. The details of ITD and HTM are discussed below.

ITD for feature extraction

EMD and ITD algorithms are two types of adaptive signal decomposition methods which are well suitable for processing nonlinear and nonstationary signal.³⁶ Performance comparison between ITD and EMD is illustrated using a simulated gearbox vibration signal. First, an amplitude and frequency-modulated simulation signal x(t) is constructed as follows

\begin{matrix} x (t) = x_{1} (t) + x_{2} (t) \\ = (1 + 0.2 \sin (2 π \times 10 t)) \cos [2 π \times 400 t + \sin (2 π \times 10 t)] \\ + \sin (2 π \times 100 t) \end{matrix}

(10)

White Gaussian noise with amplitude 0.05 is added into the above signal x(t), and the time-domain waveform of the simulated signal is shown in Figure 4.

Figure 4.

Time-domain waveform of simulated signal.

The ITD algorithm is applied to decompose signal, and the analysis results are shown in Figure 5(a). There are three PRCs and a residue component r3, and most of the signal energies are concentrated in the first two PRCs, PRC1 and PRC2, which represent the original signal components x₁(t) and x₂(t), respectively. The third PRC, PRC3, represents the white noise.

Figure 5.

Decomposition results of the simulated signal using (a) ITD and (b) EEMD.

The decomposition results of EEMD are shown in Figure 5(b). Six intrinsic mode functions (IMFs) representing decomposed signal components are obtained. Compared with ITD, the calculation time has been tremendously increased. It takes 0.11 s for the ITD method to complete signal decomposition. The EEMD takes 13.5 s for the total decomposition when number of realizations (NR) of EEMD is chosen as 50. Moreover, the parameter selection will have a significant influence on the performance of EEMD decomposition. Thus, ITD is more efficient than EMD in terms of computational complexity.

The ITD algorithm is applied to decompose the vibration signal, and the proper rotational components corresponding to gearbox defects are selected as the signal of interest. Different features are extracted including PRC energy ratio, PRC energy entropy, impulsion index, tolerance index, kurtosis index, peak index, wavelet index, and so on to represent the status of gearbox under variable operating conditions. The energy of PRC is computed as

E_{i} = \sum_{k = 1}^{N} {| c_{i} (t_{k}) |}^{2}, k = 1, 2, \dots, N

(11)

where c_i(t) represents the ith PRC and N is the total length of PRC. The energy ratio of PRC describes the energy distribution of different frequency bands²⁷ and thus can be selected as the feature of gearbox

p_{i} = \frac{E_{i}}{{(\sum_{i = 1}^{n} E_{j}^{2})}^{\frac{1}{2}}}

(12)

The energy entropy can characterize the uncertainties of energy distributions of PRCs among different frequency bands, which is expressed as

H_{PRC} = - \sum_{i = 1}^{n} p_{i} \log p_{i}

(13)

Considering variable speed and load working conditions, the commonly used indicators, such as variance and root mean square value, are no longer applicable. But most dimensionless indexes are not sensitive with the working condition, load, and speed of equipment. They are only concerned with the state of the device and also sensitive enough with fault. Therefore, dimensionless indexes are well suitable for the diagnosis of variable working condition. The dimensionless indexes selected are impulsion index, tolerance index, kurtosis index, peak index, and waveform index.³⁷

The mentioned dimensionless indexes are fundamental tasks in many statistical analyses to characterize the location and variability of a dataset. Tolerance index, impulsion index, and kurtosis index are sensitive with incipient fault and can reflect the spare impulsion signal variation. Waveform index and peak index are sensitive with slight deviation in the time domain. The above features are summarized in Table 1.

Table 1.

Expressions of features for gearbox defect diagnosis.

Index name	Expression
PRC feature energy	$p_{i} = \frac{E_{i}}{{(\sum_{i = 1}^{n} {\| E_{i} \|}^{2})}^{\frac{1}{2}}}$
ITD energy entropy	$H_{PRC} = - \sum_{i = 1}^{n} p_{i} \log p_{i}$
Impulsion index	$I_{f} = \frac{x_{max}}{\| \bar{x} \|} = \frac{x_{max}}{\frac{1}{N} \sum_{i = 1}^{N} \sqrt{\| x_{i} \|}}$
Tolerance index	$C L_{f} = \frac{x_{max}}{x_{r}} = \frac{x_{max}}{{(\frac{1}{N} \sum_{i = 1}^{N} \sqrt{\| x_{i} \|})}^{2}}$
Kurtosis index	$K_{v} = \frac{β}{x_{rms}^{4}} = \frac{\frac{1}{N} \sum_{i = 1}^{N} x_{i}^{4}}{{(\sqrt{\frac{1}{N} \sum_{i = 1}^{N} x_{i}^{2}})}^{4}}$
Peak index	$C_{f} = \frac{x_{max}}{x_{rms}} = \frac{x_{max}}{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} x_{i}^{2}}}$
Waveform index	$S_{f} = \frac{x_{rms}}{\| \bar{x} \|} = \frac{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} x_{i}^{2}}}{\frac{1}{N} \sum_{i = 1}^{N} \sqrt{\| x_{i} \|}}$

HTM for defect classification

HTM is applied here for transferring feature input into a sparse image matrix as output. Considering that the default input format of HTM is a sequence of binary bits, the extracted features are first converted into binary sequence. The specific steps are as follows:

1. Normalization. The extracted features are first normalized into the range of [0, 99] and represented by two decimal numbers

x'_{i} = floor (\frac{x_{i} - x_{min}}{x_{max} - x_{min}} \times 99)

(14)

where floor(·) is a round down function.

2. Bit vector conversion. The normalized data are then converted into bit vectors following the rules as shown in Table 2. For example, the number 10 is represented by the bit vector of [1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1].

3. Input sequence generation. An N-dimensional feature vector is converted to an N × 20 bit matrix. The input sequence can be visually expressed in a black and white binary image or bitmap, in which the black block represents bit 1, and the white one represents bit 0. The bit number of the bitmap equals to the length of the sequence. Take a 12-dimensional vector as an example, the generated 12 × 20 bitmap is shown in Figure 6, in which the vectors are arranged from top to down.

Table 2.

Bit vectors of the numbers 0 through 9.

Digits	Bit vectors	Digits	Bit vectors
0	1 1 0 0 0 0 0 0 0 1	5	0 0 0 0 1 1 1 0 0 0
1	1 1 1 0 0 0 0 0 0 0	6	0 0 0 0 0 1 1 1 0 0
2	0 1 1 1 0 0 0 0 0 0	7	0 0 0 0 0 0 1 1 1 0
3	0 0 1 1 1 0 0 0 0 0	8	0 0 0 0 0 0 0 1 1 1
4	0 0 0 1 1 1 0 0 0 0	9	1 0 0 0 0 0 0 0 1 1

Figure 6.

Illustration of HTM classification model.

The input sequence is fed into the bottom region of the HTM model. The top region and the bottom region are in a parent–child relationship,³² while the top is the parent region. With the spatial pooler discussed above, HTM fuses the bit vectors in the bottom region to obtain representative bit patterns in the top region and then classifies the patterns using Bayesian classifier. Thus, HTM integrates feature fusion and pattern classification in one model.

Experimental studies

Experiments on a parallel-shaft helical gearbox rig are performed to evaluate the performance of the presented integrative ITD and HTM method. The test rig has a gear ratio of 80:32, and its transmission mechanism is shown in Figure 7. Different types of gear faults including tooth fracture and tooth wear are introduced as shown in Figure 8. The gearbox is driven by an induction motor (rated power 2 kW), and its speed is controlled by a variable speed controller. A magnetic brake (maximum load: 17.25 N m) is equipped in the output shaft of gearbox to change the load conditions. Four accelerometers are installed in different locations of the gearbox including the front end of the input shaft, the back end of the input shaft, the front end of the output shaft, and the back end of the output shaft to acquire the gearbox status using a data acquisition system. All the accelerometers are absorbed by magnet base in the vertical direction.

Figure 7.

Experimental setup of gearbox testbed.

Figure 8.

Photographs of fault gears: (a) tooth fracture and (b) tooth wear.

To investigate gearbox diagnosis under variable operating conditions,³⁸ four different scenarios regarding speed and load are considered: (1) constant speed and constant load condition: the gearbox runs at the speed of 480 r/min without load; (2) constant speed and varying load condition: the gearbox runs at the speed of 480 r/min with varying load from 0% to 40% of the full range; (3) varying speed and constant load condition: the gearbox runs at varying speed from 360 to 600 r/min without load; and (4) varying speed and varying load condition: the gearbox runs at varying speed from 360 to 600 r/min and varying load from 0% to 60% of the full range. Take the data obtain in the back end of the output shaft, for example. Figure 9(a) shows the vibration signal for the case of gear tooth fracture under rotational speed f_r = 480 r/min. Its spectrum clearly shows the high speed shaft rotating frequency (f₁ = 19.53 Hz), gear meshing frequency (f₂ = 640.2 Hz), and gearbox resonance frequencies (f₃) excited by tooth fracture in Figure 9(b).The ITD method is then applied to decompose the vibration signal, and the decomposed results are shown in Figure 9(c).

Figure 9.

Experimental data signal processing using ITD: (a) time series data, (b) spectrum of experimental data, and (c) decomposition results of ITD method.

Next, the features listed in Table 1 are extracted to represent the gearbox status. All the indexes are used as the 12 input values, including 1 ITD energy entropy, 6 PRC feature energy, and 5 dimensionless indexes. Table 3 shows the representative features of PRC energy entropy under different operating scenarios. The feature has been analyzed in terms of mean value and variation range. PRC energy entropy of gears under normal condition is much larger than that under fault condition. Because there are no obvious impact characteristics for normal gears and the energy distribution is relatively uniform, the frequency distribution of the energy has relatively high uncertainty. For fault gear, vibration signal is more intensive in high frequency range, which are mainly distributed in the mesh frequency and higher-order multiples, so the uncertainty of the frequency distribution is relatively small. It is found that the extracted features are distinctive for different gear status and are robust to the variable operating conditions.

Table 3.

Feature evaluation under variable operating conditions.

Operating condition	Gear status	PRC energy entropy
Operating condition	Gear status	Mean value	Variation
Constant speed–constant load	Normal	1.7913	1.7053–1.9285
	Tooth fracture	0.5617	0.5235–0.6304
	Tooth wear	0.2383	0.1728–0.2927
Constant speed–varying load	Normal	1.6243	1.4201–1.7838
	Tooth fracture	0.5367	0.5126–0.5564
	Tooth wear	0.2417	0.1564–0.3054
Varying speed–constant load	Normal	1.6505	1.5329–1.9011
	Tooth fracture	0.5569	0.4567–0.7090
	Tooth wear	0.2354	0.1329–0.3586
Varying speed–varying load	Normal	1.5969	1.4135–1.7987
	Tooth fracture	0.5131	0.4400–0.5749
	Tooth wear	0.2406	0.1711–0.3586

The extracted 12-dimensional vector T = [HPRC, p1, p2, p3, p4, p5, p6, If, CLf, Kv, Cf, Sf] is then fed into the HTM model for autonomous gearbox diagnosis. The average feature vectors are as shown in Table 4.

Table 4.

Average feature vectors in all conditions.

Condition	Variable speed–variable load			Variable speed–constant load			Constant speed–variable load			Constant speed–constant load
State	Tooth wear	Tooth fracture	Normal	Tooth wear	Tooth fracture	Normal	Tooth wear	Tooth fracture	Normal	Tooth wear	Tooth fracture	Normal
HPRC	0.2084	0.5475	1.5146	0.1507	0.6742	1.6806	0.1692	0.5564	1.462	0.2755	0.527	1.7473
p1	0.9989	0.9811	0.8365	0.9997	0.9666	0.4655	0.9992	0.9861	0.856	0.9972	0.9855	0.5213
p2	0.0458	0.1911	0.3107	0.0258	0.2507	0.2783	0.0388	0.1607	0.31	0.0738	0.1665	0.3174
p3	0.0069	0.0231	0.3945	0.0049	0.0328	0.5007	0.0052	0.036	0.361	0.011	0.0278	0.6974
p4	0.0032	0.0172	0.1952	0.0026	0.0425	0.6642	0.0012	0.0089	0.168	0.0027	0.0107	0.3115
p5	0.0012	0.0102	0.0976	0.002	0.0069	0.1149	0.0006	0.0177	0.112	0.0012	0.0103	0.208
p6	0.0008	0.0013	0.0238	0.0003	0.0023	0.0287	0.0003	0.0027	0.012	0.001	0.0033	0.0288
If	6.1836	5.0694	4.9755	7.5676	5.4052	4.3441	6.7785	6.6145	4.962	7.2534	4.2774	3.8024
CLf	7.7547	6.0291	5.9127	9.4326	6.421	5.1457	8.4748	7.9781	5.901	8.973	5.0989	4.5023
Kv	5.1572	3.5294	3.187	5.1576	4.1112	2.6557	5.4732	4.2267	3.392	5.9986	3.5567	2.8767
Cf	4.4393	3.9782	3.9274	5.4764	4.2087	3.4973	4.8688	5.0475	3.894	5.2493	3.3358	3.0337
Sf	1.3929	1.2743	1.2669	1.3818	1.2843	1.2421	1.3922	1.3104	1.274	1.3818	1.2823	1.2534

Convert every feature vector into a 240-bit bitmap, acting as an input sequence of the HTM network. A bitmap of normal state at constant speed and under constant load condition is shown in Figure 10. The bitmap can be identified using HTM, and then the active column area and top state vectors of each sample can be obtained, which are shown in Table 5.

Figure 10.

Bitmap of normal state in the working condition of constant speed and constant load.

Table 5.

Active columns and state vectors of the top region corresponding to the training samples at constant speed and under dead load.

State	Sample order	Active column area	Top state vectors
Normal	1	10, 8, 1	0 1 0 0 0 0 0 0 1 0 1 0
	2	10, 8, 3, 1	0 1 0 1 0 0 0 0 1 0 1 0
	3	10, 8, 3, 1	0 1 0 1 0 0 0 0 1 0 1 0
	4	8, 7, 1	0 1 0 0 0 0 0 1 1 0 0 0
	5	8, 7, 1	0 1 0 0 0 0 0 1 1 0 0 0
	6	11, 10, 8, 3, 1	0 1 0 1 0 0 0 0 1 0 1 1
	7	10, 8, 2, 1	0 1 1 0 0 0 0 0 1 0 1 0
	8	8, 7, 1	0 1 0 0 0 0 0 1 1 0 0 0
	9	10, 8, 3, 1	0 1 0 1 0 0 0 0 1 0 1 0
Tooth fracture	10	6, 4	0 0 0 0 1 0 1 0 0 0 0 0
	11	6, 4	0 0 0 0 1 0 1 0 0 0 0 0
	12	6, 4	0 0 0 0 1 0 1 0 0 0 0 0
	13	4, 2	0 0 1 0 1 0 0 0 0 0 0 0
	14	11, 0	1 0 0 0 0 0 0 0 0 0 0 1
	15	11, 4	0 0 0 0 1 0 0 0 0 0 0 1
	16	11, 7, 4, 2, 0	1 0 1 0 1 0 0 1 0 0 0 1
	17	11, 4, 2	0 0 1 0 1 0 0 0 0 0 0 1
	18	4, 2, 0	1 0 1 0 1 0 0 0 0 0 0 0
Tooth wear	19	5	0 0 0 0 0 1 0 0 0 0 0 0
	20	6, 5	0 0 0 0 0 1 1 0 0 0 0 0
	21	6, 5	0 0 0 0 0 1 1 0 0 0 0 0
	22	5	0 0 0 0 0 1 0 0 0 0 0 0
	23	5	0 0 0 0 0 1 0 0 0 0 0 0
	24	5	0 0 0 0 0 1 0 0 0 0 0 0
	25	11, 5	0 0 0 0 0 1 0 0 0 0 0 1
	26	5	0 0 0 0 0 1 0 0 0 0 0 0
	27	6	0 0 0 0 0 0 1 0 0 0 0 0

The top state vectors obtained from Table 5 are taken as feature vectors, classification and recognition are conducted via classifier, and then the fault diagnosis can be realized. The k-fold cross-validation method³⁹ is used to calculate the recognition accuracy of the HTM model. The basic principle of the k-fold cross-validation is to divide the N experimental samples into k disjoint subsets of samples. In this article, we divide the 36 samples for each case into 4 subsets evenly. A total of 144 sets of data are adopted at all, 108 sets of them are used for training and 36 sets of them are served for testing. The gearbox defect identification accuracy under variable operating conditions is shown in Figure 11. In the first two operating scenarios, the gearbox defects are identified accurately. In the last two operating scenarios, the defect identification accuracy is 97.2%.

Figure 11.

Gearbox diagnosis results under different operating conditions: (a) constant speed and constant load, (b) constant speed and varying load, (c) varying speed and constant load and (d) varying speed and varying load.

In order to demonstrate the superiority of the HTM for variable working conditions, commonly used pattern classification techniques including back-propagation (BP) neural network (NN), fuzzy c-means clustering (FCM), and SVM are also tested, and the results are compared in Table 6.

Table 6.

Classification accuracy using different methods.

	BP	FCM	SVM	HTM with ITD features only	HTM
Normal	97.9	93.8	93.8	100	100
Tooth fracture	93.8	97.9	97.9	93.8	95.8
Tooth wear	87.5	85.4	91.2	97.9	100
Total accuracy	93.1 ± 2.7	92.4 ± 3.6	94.4 ± 1.8	97.2 ± 1.7	98.6 ± 1.3

BP: back-propagation; FCM: fuzzy c-means clustering; SVM: support vector machine; HTM: hierarchical temporal memory; ITD: intrinsic time-scale decomposition.

According to the classification results of HTM, the normal and tooth wear state can be accurately classified, and part of the tooth fracture signal may be misclassified into tooth wear condition. It is mainly caused by the similarity of tooth wear and tooth fracture fault characteristic frequency in spectrum. Also, coupling with aggravation of the tooth wear degree, the difference between the two faults will be smaller. Finally, the interaction of the method proposed here and the data collected tends to misclassify tooth fracture fault to tooth wear condition compared with FCM and SVM. We can see the total recognition rate of HTM is higher compared with other classification techniques. It is because BP NN and SVM are essentially a set of samples of the input and output that is transformed into a nonlinear optimization problem. And FCM is more suitable for slug sample set in which each type of the sample’s characteristics has little difference. It can also be seen from the standard deviation of total accuracy, the method of this article possesses the highest accuracy and stability. The results show that HTM has superiority over conventional pattern classifiers and can be effectively applied in gearbox fault diagnosis under variable working conditions.

Conclusion

This article developed an integrative approach of ITD and HTM method for fault diagnosis of gearbox under variable operating conditions. Simulation and experimental studies have been performed to validate the effectiveness of the presented method. From the above analysis, the conclusions can be drawn as follows:

The feature extraction method based on ITD is investigated. A variety of features including PRC energy ratio, PRC energy entropy, and dimensionless indexes are obtained to represent gearbox status. Experimental studies show that the obtained features are insensitive to variable operating conditions including speed and load changes.

An integrative feature fusion and pattern classification model based on HTM is carried out. The effectiveness of the presented method under variable working conditions is validated through the performance comparison with BP NN, SVM, and FCM.

More experimental studies under different operating conditions will be performed to further evaluate the presented method in our future research.

Footnotes

Acknowledgements

The valuable comments from anonymous reviewers are appreciated to improve the article’s quality.

Academic Editor: Pak Wong

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research received financial support provided by National Science foundation of China (no. 51504274) and Science Foundation of China University of Petroleum, Beijing (nos 2462014YJRC039 and 2462015YQ0403).

References

Zhang

Tian

. A new feature extraction and selection scheme for hybrid fault diagnosis of gearbox. Expert Syst Appl 2011; 38: 10000–10009.

Jae

David

Brandon

. On the use of a single piezoelectric strain sensor for wind turbine planetary gearbox fault diagnosis. IEEE T Ind Electron 2015; 62: 6585–6593.

Liu

Wang

SH.

Human-centered saliency detection. IEEE T Neural Netw Learn Syst 2016; 27: 1150–1162.

Chen

Zhang

. Sparse feature identification based on union of redundant dictionary for wind turbine gearbox fault diagnosis. IEEE T Ind Electron 2015; 62: 6594–6605.

Elasha

Carcel

Mba

. Pitting detection in worm gearboxes with vibration analysis. Eng Fail Anal 2014; 42: 366–376.

Fan

Zuo

MJ.

Gearbox fault detection using Hilbert and wavelet packet transform. Mech Syst Signal Pr 2006; 20: 966–982.

Cai

Chen

Sparsity-enabled signal decomposition using tunable Q-factor wavelet transform for fault feature extraction of gearbox. Mech Syst Signal Pr 2013; 41: 54–53.

Amarnath

Krishna

IRP

. Local fault detection in helical gears via vibration and acoustic signals using EMD based statistical parameter analysis. Measurement 2014; 58: 154–164.

Yang

EMD and wavelet transform based fault diagnosis for wind turbine gear box. Adv Mech Eng 2013; 5: 1–9.

10.

Tian

Qian

Planetary gearbox fault feature enhancement based on combined adaptive filter method. Adv Mech Eng 2015; 7: 1–12.

11.

Ibrahim

Albarbar

Abouhnik

. Adaptive filtering based system for extracting gearbox condition feature from the measured vibrations. Measurement 2013; 46: 2029–2034.

12.

Cheng

Yang

DJ.

The envelope order spectrum based on generalized demodulation time-frequency analysis and its application to gear fault diagnosis. Mech Syst Signal Pr 2010; 24: 508–521.

13.

Hajnayeb

Ghasemloonia

Khadem

. Application and comparison of an ANN-based feature selection method and the genetic algorithm in gearbox fault diagnosis. Expert Syst Appl 2011; 38: 10205–10209.

14.

Saravanan

Ramachandran

KI.

Fault diagnosis of spur bevel gear box using discrete wavelet features and decision tree classification. Expert Syst Appl 2009; 36: 9564–9573.

15.

Feng

. Fault features analysis of cracked gear considering the effects of the extended tooth contact. Eng Fail Anal 2015; 48: 105–120.

16.

Chen

YS.

Research on the dynamic mechanism of the gear system with local crack and spalling failure. Eng Fail Anal 2012; 26: 12–20.

17.

Liang

Zuo

Hoseini

MR.

Vibration signal modeling of a planetary gear set for tooth crack detection. Eng Fail Anal 2015; 48: 185–200.

18.

Yang

TY.

Diagnostics of gear deterioration using EEMD approach and PCA process. Measurement 2015; 61: 75–87.

19.

Combet

Gelman

An automated methodology for performing time synchronous averaging of a gearbox signal without speed sensor. Mech Syst Signal Pr 2007; 21: 2590–2606.

20.

Meltzer

Dien

NP.

Fault diagnosis in gears operating under non-stationary rotational speed using polar wavelet amplitude maps. Mech Syst Signal Pr 2004; 18: 985–992.

21.

Shao

Dong

Wang

. Influence of cracks on dynamic characteristics and stress intensity factor of gears. Eng Fail Anal 2013; 32: 63–80.

22.

Cai

Han

Study on stress intensity factors for crack on involute spur gear tooth. Adv Mech Eng 2015; 7: 1–12.

23.

Wang

Gao

Yan

Multi-scale enveloping order spectrogram for rotating machine health diagnosis. Mech Syst Signal Pr 2014; 46: 28–44.

24.

Heyns

Godsill

de Villiers

. Statistical gear health analysis which is robust to fluctuating loads and operating speeds. Mech Syst Signal Pr 2012; 27: 651–666.

25.

Frei

Osorio

Intrinsic time-scale decomposition: time-frequency-energy analysis and real-time filtering of non-stationary signals. P Roy Soc A-Math Phy 2007; 463: 321–342.

26.

Jiang

Chen

. Application of the intrinsic time-scale decomposition method to fault diagnosis of wind turbine bearing. J Vib Control 2012; 18: 240–245.

27.

Hawkins

George

Hierarchical temporal memory including HTM cortical learning algorithms. Numenta Incorporated, Redwood City, CA, 2007, pp.8–17.

28.

SassiF Ascari

Cagnoni

Classifying human body acceleration patterns using a hierarchical temporal memory. In: Proceedings of the 11th congress of the Italian association for artificial intelligence, Emilia, 9–12 December 2009, pp.496–505. Berlin: Springer.

29.

Kapuscinski

. Using hierarchical temporal memory for vision-based hand shape recognition under large variations in hand’s rotation. In: Proceedings of the 10th international conference on artificial intelligence and soft computing, Zakopane, 13–17 June 2010, pp.272–279. Berlin: Springer.

30.

Rozado

Rodriguez

Varona

Low cost remote gaze gesture recognition in real time. Appl Soft Comput 2012; 12: 2072–2084.

31.

Rozado

Rodriguez

Varona

Extending the bioinspired hierarchical temporal memory paradigm for sign language recognition. Neurocomputing 2012; 79: 75–86.

32.

Kostavelis

Gasteratos on the optimization of hierarchical temporal memory. Pattern Recogn Lett 2011; 33: 670–676.

33.

Rodriguez-Cobo

Ruiz-Lombera

Conde

. Feasibility study of Hierarchical Temporal Memories applied to welding diagnostics. Sensor Actuat A-Phys 2013; 204: 58–66.

34.

Pernkopf

Wohlmayr

Stochastic margin-based structure learning of Bayesian network classifiers. Pattern Recogn Lett 2013; 46: 464–471.

35.

Farid

Zhang

Rahman

. Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Syst Appl 2014; 41: 1937–1946.

36.

Guo

Xie

. Online detection of time-variant oscillations based on improved ITD. Control Eng Pract 2014; 32: 64–72.

37.

Sun

Chen

JQ.

Decision tree and PCA-based fault diagnosis of rotating machinery. Mech Syst Signal Pr 2007; 21: 1300–1317.

38.

Wang

Makis

A wavelet approach to fault diagnosis of a gearbox under varying load conditions. J Sound Vib 2010; 329: 1570–1585.

39.

Wiens

Dale

Boyce

. Three way k-fold cross-validation of resource selection functions. Ecol Model 2008; 212: 244–255.