Sage Journals: Discover world-class research

Abstract

In the process of predicting the remaining cutter life, the deep-learning method such as convolutional neural network does not consider the time correlation of different degradation states, which directly affects the accuracy of the remaining cutter life prediction. To extract the features with time-series information to predict the remaining cutter life more effectively, this article proposes a new deep neural network, which is named the multi-scale cyclic convolutional neural network. In the multi-scale cyclic convolutional neural network, a multi-scale cyclic convolutional layer is constructed to memorize the degradation state at different moments and to mine the timing characteristics of multiple sensor data. Multi-scale features are extracted through multi-scale convolution, and the convergence of parameters is improved by layer-by-layer training and fine-tuning. Finally, the remaining cutter life is predicted based on the features. The comparison with the published prediction methods of convolutional neural network and recurrent neural network models proves that our method (multi-scale cyclic convolutional neural network) is superior in improving the precision and accuracy of remaining cutter life prediction. This method breaks through the limitations of the convolutional neural network prediction model in this field and provides a theoretical basis for evaluating the remaining service life of the cutter.

Keywords

Remaining-life prediction cutter convolutional neural network recurrent network multi-scale

Highlights

For the sensor noise interference problem, a large-scale convolutional kernel is used to filter signal noise.

A multi-scale convolutional kernel is used to extract multi-scale features and maintain global and local features to improve network capacity and model feature learning capability.

A multi-scale cyclic convolutional layer is constructed to store degradation state information, and this layer is used to mine the time-series features of the original data to improve the remaining-life-prediction accuracy of the model.

Introduction

In advanced manufacturing systems, the high performance of a machine tool is the key to producing high-quality machined surfaces, and the main cause of cutter failure is cutter contact tip wear. To ensure machining accuracy within the cutter life cycle, the industry generally adopts excessive protection strategies, resulting in additional machining cost. Therefore, if the remaining cutter life (RUL) of the cutter can be accurately estimated, which can be fully utilized and reduces the purchase of cost. Moreover, the workload can also be greatly reduced. However, due to the intermittent contact between the cutter and the artifacts, it is a challenge to capture the dynamic characteristics of the cutter wear mechanism, which severely restricts the efficiency and accuracy of cutter RUL prediction. To develop an effective RUL prediction system, scholars have carried out more and more researches. At present, RUL prediction methods are mainly divided into two types, namely methods based on physical models of RUL prediction and methods based on data-driven of RUL prediction.^1,2

The method based on the physical model is to use the wear failure mechanism of the tool to mathematically model the entire wear degradation process of the tool, and then use the empirical formula to optimize the parameters of the mathematical model, and finally predict the RUL. However, due to the complex failure mechanisms of different tools, it is difficult to establish an accurate mathematical model in practical applications. On the contrary, the data-driven method only needs to input the data characteristics, and does not require the empirical formula of the physical model and the complex failure mechanism. Therefore, in recent years, data-driven RUL prediction methods have become more and more popular. Benkedouh et al. proposed RUL based on support vector regression to predict the tool. First, extract feature vectors from the vibration signal, force signal, and acoustic emission signal provided by the 2010 Predictive Health Management (PHM) data set, and then regression predicts the RUL of the tool, but the result shows that the error is large. Wu et al.³ proposed a tool remaining-life-prediction method based on random forest model, which extracts 28 feature vectors from cutting force signals, vibration signals, and acoustic emission signals and uses these 28 feature vectors to train random forest. The model is used to predict the tool wear value, and the experiment shows that it has good predictive performance. Drouillet et al. used artificial neural networks (ANNs) to predict the RUL of the cutter using the motor spindle power. Yan and Lee⁴ proposed a logistic regression model to predict the RUL of the drill bit, and predict the RUL of the drill bit by establishing the relationship between the vibration signal and the wear value.

Niaki et al.⁵ used wavelet analysis to extract the time-domain and frequency-domain features of multi-sensor signals for recurrent neural network (RNN) model tool wear prediction; improved RNN model through the application of sensor information fusion in tool wear estimation. Research shows that its generalization performance can be up to 13%. Drouillet et al.⁶ studied the relationship between RUL and machine tool spindle power to predict RUL and found that the error range between the predicted RUL of the tool and the real RUL is very small, which proves that the machine tool spindle power value is a very effective feature vector. Corne et al.⁷ used the neural network to input the spindle power signal and vibration force signal data for processing. The study showed that the use of power signal and vibration force signal data to predict tool flank wear value error is about 0.4%–18.4%. Kong et al.⁸ studied tool wear based on the kernel principal component analysis method based on integral radial basis function and Gaussian process regression (GPR). The study showed that the kernel principal component analysis has smoothness and GPR’s confidence interval range; at the same time, it is better than neural network and support vector machine in improving the accuracy of tool wear prediction. But the disadvantage is that these models largely depend on the sensitivity of the extracted features, which is usually realized through expert knowledge.^9–12 Tobon-Mejia et al.’s¹³ study is based on the dynamic Bayesian network (DBN) model to predict and identify the remaining service life of the tool. Kaya uses all neural networks to verify the reliability of the model in the prediction of milling tool wear. Therefore, machine learning is used by many scholars to extract tool degradation characteristics to predict its remaining life.^14–17

As one of the data-driven methods, deep-learning methods¹⁸ can automatically extract features based on raw sensor data and build corresponding prediction models.¹⁹ Among deep-learning techniques, convolutional neural network (CNN)²⁰ has received special attention in tool RUL prediction because of its huge advantages in processing time-series signals. To take advantage of the powerful feature extraction capabilities of CNN, Babu et al. used CNN for the first time to solve the cutter RUL prediction problem, and through experiments proved that this method is significantly better than multilayer perceptron and support vector regression method. Other researchers have also studied RUL associated with various signals by adopting deep-learning methods such as CNN and RNN in recent years.^14,21,22 Although deep-learning models have great advantages in automatically extracting the features, these models cannot obtain feedback information and memory information from time-series data. The two information features in the sensor data divided by various noise signals with a long duration may lead to prediction failures. During operation, with the passage of time, the tool degenerates from a normal wear state to a completely blunt failure state, which is a gradual degradation process over time. Correspondingly, the degradation state of the cutter at different moments is related to the time scale. However, the existing research ignores this dependence in the network construction process, which affects the accuracy of the prediction model and limits its promotion. Scholars have proposed many methods to reduce the dimensionality of the original data to improve the accuracy of the remaining-life-prediction model. However, they rarely mentioned the influence of the time-series information existing among the data on the prediction accuracy. Therefore, the establishment of correlation models of different degradation states is very important to predict the RUL of the cutter accurately.

To effectively use the timing information which is hidden in the signal, this article proposes a new deep-learning method, namely the multi-scale cyclic convolutional neural network (MSRCNN) to predict the RUL. The basic principle of MSRCNN is first, by constructing a new multi-scale cyclic convolutional network layer to memorize the timing of different degradation states and to mine the timing characteristics of the original data. Then, multi-scale features are extracted through multi-scale convolution; global and local features are retained; and network parameters are optimized by layer-by-layer training and fine-tuning. Finally, the remaining service life is predicted based on the characteristics.

The main contributions of this article are as follows: (1) in response to the problems of sensor noise interference, a large-scale convolution kernel is used to filter out signal noise; (2) a multi-scale convolution kernel is adopted to extract multi-scale features, maintaining global and local features, and improving the network capacity and model feature–learning ability; (3) construct a multi-scale cyclic convolution layer to memorize the information of the degraded state, which is adopted to mine the time-series characteristics of the original data and improve the remaining-life-prediction accuracy of the model.

Architecture of MSRCNN model

Cyclic convolutional layer

As the convolutional layer of the core component of CNN, it does not require manual intervention and can extract useful features from the input data. However, there is no recurrent layer in the convolutional layer, which means that the signal only flows forward in the CNN, and the output cannot be fed back to the input. Correspondingly, only the current input information in each time step is considered and the previous degradation information is ignored in CNN. In particular, the existing prediction methods based on CNN cutter RUL cannot solve this issue and leads to reduce their prediction accuracy and generalization ability. Therefore, in this article, a new cyclic convolutional layer is constructed to solve this problem and improve the prediction performance of the agorithm. Different from the convolutional layer, a cyclic connection is added between the output and the input in the cyclic convolutional layer, so that the information is transmitted cyclically instead of one direction. In the cyclic convolutional layer, the output information is fed back to the input through the cyclic connection, and the degradation of information over time is memorized. Therefore, the output of the cyclic convolutional layer depends on the current input state information and the previous state information in the past input memory. Through this dynamic tuning, the time sequence characteristics of the input data can be fully mined and the temporal correlation model of different degradation states can be established in the cyclic convolutional layer.

In theory, the cyclic connection enables the cyclic convolutional layer to feed back output information from the input sensor data to the input, forming a cyclic information flow, rather than a one-way flow. However, in practical applications, the convolutional layer often encounters the problem of gradient disappearance during the training iteration process. In order to reduce the effect of the disappearance of the gradient and capture long-term correlations, a gated selection mechanism is introduced in the recurrent convolutional layer,²³ whereas the gated selection mechanism does not exist in long short-term memory (LSTM) networks.^24,25 By introducing a selective mechanism, the recurrent convolutional layer is able to appropriately forget or emphasize information from previous moments as well as the current moment. On one hand, the reset gate is able to determine the extent to which past information is forgotten, which will effectively allow the network to forget some previously irrelevant information. On the other hand, the update gate controls the amount of information passed from the previous state to the current state, which helps the network to remember long-term information and eliminate the problem of gradient disappearance. Thus it is able to capture the dependencies on different time scales adaptively.

As shown in Figure 1, $x_{t}^{i} = f (x_{t}^{i - 1}, h_{- 1}^{i})$ , where $f (\cdot)$ is the nonlinear activation function, $x_{j - 1}^{t}$ is the input time sequence sensor data, and $h_{t - 1}^{i} = x_{t - 1}^{'}$ is the storage state fed back by the loop connection at time step t – 1. Two gated networks are created in the gated loop convolutional layer, namely the reset gate $r_{j - 1}^{t}$ and the update gate $u_{j - 1}^{t}$ as given by

r_{t}^{i} = δ (K_{r}^{i} * x_{t}^{j - 1} + W_{r}^{i} * h_{t - 1}^{i} + b_{r}^{i})

(1)

u_{r}^{i} = δ (K_{u}^{i} * x_{t}^{i - 1} + W_{u}^{i} * h_{t - 1}^{i} + b_{u}^{i})

(2)

where $δ (\cdot)$ is the logistic sigmoid function, * represents the convolution operator $k_{r}^{i}$ , $w_{r}^{i}$ , $k_{u}^{i}$ , and $w_{u}^{i}$ are the convolution kernel, $b_{r}^{i}$ and $b_{u}^{i}$ are the bias terms. When the time step is t, the state of the cyclic convolutional layer $x_{t}^{i}$ can be expressed as follows

x_{t}^{i} = u_{t}^{i} ° h_{t - 1}^{i} + (1 - u_{t}^{i}) ° {\tilde{h}}_{t}^{i}

(3)

{\tilde{h}}_{t}^{i} = \tanh (K_{h}^{i} * x_{t}^{i - 1} + W_{h}^{i} * (r_{t}^{i} ° h_{t - 1}^{i}) + b_{h}^{i})

(4)

where $h_{t}^{i}$ represents the newly generated state, $\tanh (\cdot)$ is the activation function, $k_{h}^{i}$ and $w_{h}^{i}$ are the convolution kernel, $b_{h}^{i}$ is the bias term, and ° represents the Hadamard product (matrix elements correspond to multiplication). It can be seen from the equation (3). When the time step is t, $x_{t}^{i}$ is a linear mapping between the state at the previous moment and the current state, the reset gate and refresh gate control its current state.

Figure 1.

Cyclic convolution layer gate control mechanism.

Multi-scale and one-dimensional sensing data

The input data of MSRCNN is various parameters collected by multiple sensors. To comprehensively utilize all the sensor data, this article uses the sliding window strategy to construct multi-channel one-dimensional sensor data. The process can be expressed as follows

\begin{matrix} I (i, :, t) = x_{s} (i : w + i) \\ i = 1, 2, \dots, m - w \end{matrix}

(5)

among them, I is the constructed multi-channel data, w is the width of the sliding window, m is the life span, t is the number of corresponding channels represents different sensing parameters. The detailed process of data generation is shown in Figure 2. Convolution kernels of different sizes can extract data information from different time scales in the convolutional network. During the tool wear degradation stage, as time goes by, more and more correlated degradation features have been recorded. The monitoring data collected by multiple sensors such as vibration signals, acoustic emission signals, acceleration vibration signals, etc. are also different.

Figure 2.

Multi-scale one-dimensional sensing data generation process.

Therefore, in the deep prediction network, if a single convolution kernel is used to automatically extract feature information, it will cause the prediction accuracy of the model to decrease, because the degradation information will be lost in the learning process. In order to avoid this problem, this article proposes a multi-scale learning strategy. As shown in Figure 2, three convolution kernels with inconsistent sizes are arranged in parallel in the multi-scale learning strategy, namely 1 × F, 2 × F, and 4 × F, to extract sensitive features from the input sensor data signal. In the learning process to fully extract degradation information on different time scales, thereby ensuring the integrity of the features. Before entering the RUL prediction network, the three extracted feature vectors are concatenated together as the overall input.

The overall layout of MSRCNN

The architecture of MSRCNN proposed in this article is shown in Figure 3. The proposed MSRCNN includes structures including multi-scale cyclic convolutional layer (MSRCL), convolutional pooling layer (PL) and convolutional fully connected layer (FCL). In order to comprehensively use multiple sensors to monitor the measured data information, this article uses a multi-scale learning strategy to integrate multiple sensor data information as the input of the multi-scale cyclic convolutional network. Then, createing N recursive convolutional layers and N PLs and connecting them together to automatically extract the degradation information in the sensor data, and finally establish and predict different degradation state models. In the recursive convolution layer, the number i of the cyclic convolution layer is set to 1, 2, 3,…, N, the size of the convolution kernel is 1 * k, which the number is $2^{i - 1}$ , and the cyclic convolution layer has the same other parameter settings. For the N – 1th PL, the maximum pooling sampling function is used, and the last PL uses the global maximum pooling sampling function. At the same time, N recursive convolutional layers are converted into a vector of size $2^{N - 1}$ M, and then the vector is input to the subsequent FCL for RUL prediction estimation. In this article, the number L of FCL is set to 3. In the first two FCLs, the number of neurons that are activated nonlinearly using the corrected linear unit (ReLU) is F, in the third FCL, the number of neurons used for RUL prediction as the output layer of MSRCNN is 1. Every MSRCL in this article is followed by a PL. Every MSRCL in this article is followed by a PL. For the ith MSRCL, the parameter settings are the same, the number of convolution kernels is $2^{i - 1}$ M, and the size of the convolution kernel is k * 1. For the first N – 1 PLs, maximum pooling and non-overlapping sliding windows are used, that is, p = s, and the last PL uses the global maximum PL, FCL has a total of three layers, and the first two FCL neurons use ReLU. The activation function, MSRCL, and FCL all apply dropout and L2 regularization.

Figure 3.

Architecture of multi-scale cyclic convolution network.

Experimental setup and data processing

Experimental setup

The experimental platform of the CNC milling machine and the installation positions of different types of sensors are shown in Figure 4. The workpiece is cut and the material is removed from the raw material, the original skin layer material with rough particles is removed by face milling, and then the workpiece is milled. A Kistler9265B three-way dynamometer is installed between the workpiece and the processing test bench to measure the cutting force in the form of electric charge and convert it into a voltage signal for storage through the Kistler5019A charge amplifier. Three Kistler piezoelectric accelerometers are installed on the test bench to measure the vibration of the machine tool in the X, Y, and Z directions.

Figure 4.

Positioning of different types of sensors on the CNC milling machine test bench.

Data description

In order to prove the effectiveness of the method proposed in this article in the prediction of the remaining service life of the tool, this section applies MSRCNN to the experimental data of milling cutters for RUL prediction. The experimental data comes from the New York Society for Forecasting and Health Management (New York Society for Forecasting and Health Management). PHM shares the data for the 2010 high-speed CNC machine cutter health prediction competition. The experimental conditions are shown in Table 1.

Table 1.

Main equipment and test conditions for milling test.

Hardware condition	Model and main parameters	Cutting conditions	Parameter
CNC milling machine	High-speed CNC machine Roder Tech RBM760	Spindle speed (r/min)	1040
Force gauge	Kistler 9265B three-way force gauge	Feeding speed (mm/min)	1555
Charge amplifier	Kistler 5019a multi-channel charge amplifier	Axial cutting depth (mm)	0.2
Milling material	Inconel 718, rectangular	Radial cutting width (mm)	0.125
Cutter	Ball-end carbide milling cutter with 3 teeth	Feed per pass (mm)	0.001
Data acquisition card	NI DAQ data acquisition card	Sampling frequency (kHz)	50
Wear measuring instrument	LEICA MZ12 microscope	Cooling condition	Dry cut

Data preprocessing

To unify the data range, all sub-data sets in the data are standardized. The standardization formula is as follows

x_{j}^{i *} = \frac{x_{j}^{i} - min (x^{i})}{max (x^{i}) - min (x^{i})}

(6)

where $X_{j}^{i}$ is the original data of the number of j sensing parameter in line i, and $X_{j}^{i *}$ is the data after standardized processing of the number of j sensing parameter in line i.

Evaluation index

To evaluate the performance of the proposed method in RUL prediction, this article uses four evaluation indicators as follows: (1) explained variance score (EVS), explained the variance score of the regression model. Its value range is [0,1]. The closer it is to 1, the more the independent variable can explain the variance of the dependent variable. The smaller the value, the worse the effect. (2) Mean absolute error (MAE), used to evaluate how close the predicted result is to the real data set. The smaller the value, the better the fitting effect. (3) Mean squared error (MSE), this indicator calculates the mean value of the sum of squared errors between the fitting data and the corresponding sample points of the original data. The smaller the value, the better the fitting effect. (4) $R^{2}$ score $(R^{2})$ coefficient of determination, its meaning is also to explain the variance score of the regression model. Its value range is [0,1]. The closer to 1, the more the independent variable can explain the variance of the dependent variable. The smaller the value, the more that means the worse the effect. Where y is the true value, $\tilde{y}$ is the predicted value, and $\bar{y}$ the mean value

EVS (y, \tilde{y}) = 1 - \frac{Var {y - \tilde{y}}}{Var {y}}

(7)

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\tilde{y}}_{i} |

(8)

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\tilde{y}}_{i})}^{2}

(9)

R^{2} (y, \tilde{y}) = 1 - \frac{\sum_{i = 0}^{n - 1} {(y_{i} - {\tilde{y}}_{i})}^{2}}{\sum_{i = 0}^{n - 1} {(y_{i} - {\bar{y}}_{i})}^{2}}

(10)

Results and discussion

In the tool RUL prediction, this article performs five cross-validation on the training data set used, and finally determines the network structure parameters of MSRCNN. The structure parameters include the number of convolution kernels M, the size of convolution kernels 1 × K, and the number of recursive convolution layers N, the size of PLs p, and the number of neurons F. In this article, the FCL uses dropout and L2 regularization, the recursive convolution layer also uses these two rules, both of them use random V forward propagation to obtain the predicted mean and variance. In addition, the loss function of the MSRCNN network in this article is the mean square error, and the optimizer uses Adam to optimize the loss of the objective function in the equation by updating the weight value and deviation of the MSRCNN network through optimization iteration. The MSRCNN model is trained from scratch for 50 iterations, and its detailed configuration is shown in Table 2.

Table 2.

MSRCNN configuration in milling cutter RUL prediction.

Hyper parameter	Size	Hyper parameter	Size
Number of kernels, M	16	Kernel size, k * 1	8*1
Number of layers, N	4	Pooling size, P	8
Number of neurons, F	100	Dropout probability, Л	0.15
Number of forward passes, V	1000	Number of epochs	50
Mini-batch size	128	Weight decay coefficient, λ	$10^{- 5}$

First, MSRCNN uses multi-scale convolution to extract multi-scale features, and then circulates the convolution layer to simulate the time correlation of different degradation states, mines the timing characteristics of the data, and uses layer-by-layer training and fine-tuning to improve the convergence of parameters. The loss function of the MSRCNN training process decreases as shown in Figure 5. Set 50 as the number of iterations. It can be seen that after 50 iterations, the training error approaches 0, reaching convergence.

Figure 5.

Training iteration process.

Based on the constructed convolutional layer, this article evaluates the influence of the number of neurons in the MSRCNN layer on the model evaluation index, as shown in Figure 6. This article increases the network depth of MSRCNN by increasing the number of cyclic convolutional layers and PLs. It can be seen that as the network depth increases, the higher the EVS and $R^{2}$ values, the lower the MAE and MSE values. Therefore, choosing the right number of neurons is very significant for model optimization parameters.

Figure 6.

Influence of network depth on evaluation index. (a) values of EVS at different network depths, (b) values of MAE at different network depths, (c) values of R² at different network depths, and (d) values of MSE at different network depths.

To analyze its impact more quantitatively, this article uses five different network layers to predict the milling cutter data, and uses the above four evaluation indicators to evaluate the network layers to evaluate the RUL prediction model, as shown in Table 3.

Table 3.

Influence of network depth on MSRCNN prediction performance.

No oflayers	EVS			MAE			$R^{2}$			MSE
	CNN	RNN	MSRCNN	CNN	RNN	MSRCNN	CNN	RNN	MSRCNN	CNN	RNN	MSRCNN
2	0.786	0.679	0.876	0.223	0.273	0.103	0.789	0.680	0.886	0.111	0.172	0.072
3	0.820	0.723	0.927	0.199	0.284	0.094	0.872	0.792	0.902	0.092	0.131	0.032
4	0.858	0.857	0.953	0.101	0.141	0.051	0.911	0.860	0.950	0.088	0.094	0.004
5	0.838	0.838	0.930	0.106	0.186	0.066	0.903	0.843	0.923	0.099	0.112	0.012
6	0.822	0.829	0.922	0.141	0.199	0.071	0.890	0.823	0.926	0.113	0.131	0.011

It can be seen that as the network depth increases, the EVS and R2 values are closer to 1, and the MAE and MSE values are getting closer and closer to 0. However, when the network depth is small (when it is 2), its evaluation index is poor, which may be due to its insufficient fitting ability. When the network depth is deeper (6 o’clock), this will result in a heavier calculation burden and increase the cost of calculation time. Unfortunately, the accuracy rate has not been improved. That may be due to the fact that excessive network depth leads to accuracy saturation and even over-fitting. The above analysis of the results shows that it is particularly important to select the appropriate number of neurons for the prediction model. Therefore, based on the above data analysis, this article sets the number of cyclic convolutional layers N to 4.

The MSRCNN model is mainly composed of the CNN and the RNN. To verify the effectiveness of the model, the model is compared with the CNN model and the RNN model. Taking the two sets of cutter data of milling cutter 1 and milling cutter 2 as the test set, the above three models are used to predict their remaining service life, respectively. The prediction results are shown in Figure 7, wherethe x-label is time, and the y-label is the percentage of the remaining life corresponding to the sample at the current moment in the life cycle. The blue and green, brown solid lines represent the predicted life values, and the red solid lines are the actual life values, respectively.

Figure 7.

Prediction results of milling cutter RUL with different (a) comparison between CNN model and MSRCNN model (Cutter1); (b) comparison between RNN model and MSRCNN model (Cutter1); (c) comparison between CNN model and MSRCNN model (Cutter2); and (d) comparison between RNN model and MSRCNN model (Cutter2).

Figure 7 shows that the goodness of fit of the MSRCNN model is best compared to the other two models and are closest to the real RUL.

Table 4 shows that the MSRCNN model proposed in this article is better than CNN and RNN in the evaluation indicators EVS, MAE, $R^{2}$ , and MSE, which shows that MSRCNN can provide more accurate RUL prediction results, and its performance is steadier. The comparison results proved the MSRCNN model is conducive to the improvement of accuracy and further improves the accuracy of RUL. Due to the powerful feature extraction ability of multi-scale convolution and the ability of circular convolution to excavate the time sequence characteristics of data, the introduction of time sequence features can reduce the prediction error.

Table 4.

Prediction error of different models.

The test cutter	Evaluation index	CNN	RNN	MSRCNN
Cutter 1	EVS	0.858	0.857	0.953
	MAE	0.101	0.141	0.051
	$R^{2}$	0.911	0.860	0.950
	MSE	0.088	0.094	0.004
Cutter 2	EVS	0.846	0.838	0.947
	MAE	0.109	0.151	0.054
	$R^{2}$	0.908	0.828	0.928
	MSE	0.089	0.097	0.006

CNN: convolutional neural network; RNN: recurrent neural network; MSRCNN: multi-scale cyclic convolutional neural network; EVS: explained variance score; MAE: mean absolute error; MSE: mean squared error.

Conclusion

In this article, a new prediction framework MSRCNN for RUL prediction of cutting cutters is proposed. The proposed MSRCNN takes the time-series data collected by different sensors as input and constructs a cyclic convolution layer to simulate the degraded state of the cutter to mine the time-series characteristics of the data. Then, multi-scale features are extracted through multi-scale convolution, and the parameters are optimized by layer-by-layer training and fine-tuning. By periodically superimposing multiple cyclic convolutional layers and maximum PLs, feature information is automatically extracted from the input data. Finally, by inputting these learned features into the subsequent FCL to estimate RUL, CNN and RNN prediction methods are compared. Experimental results show that compared between the existing CNN-based prediction models and RNN-based prediction models, the proposed MSRCNN has obvious advantages in accuracy and convergence. We can conclude that our proposed method overcomes the limitations of the CNN prediction model in the evaluation of the remaining service life of the cutter.

Footnotes

Handling Editor: Francesc Pozo

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported in part by the East China Jiao Tong University Fundamental Research Funds (grant no. 2003419018) and by the National Natural Science Foundation of China (grant no. 52067006).

ORCID iD

Tao Li

References

Dong

Wang

, et al. Set-membership filtering for state-saturated systems with mixed time-delays under weighted try-once-discard protocol. IEEE Trans Circ Syst II Exp Briefs 2019; 66(2): 312–316.

Wang

Lei

, et al. A hybrid prognostics approach for estimating remaining useful life of rolling element bearings. IEEE Trans Reliab 2020; 69(1): 401–412.

Jennings

Terpenny

, et al. A comparative study on machine learning algorithms for smart manufacturing: tool wear prediction using random forests. J Manuf Sci Eng 2017; 139(7): 071018.

Yan

Lee

. A hybrid method for on-line performance assessment and life prediction in drilling operations. In: Proceedings of the 2007 IEEE international conference on automation and logistics, Jinan, China, 18–21 August 2017, pp.2500–2505. New York: IEEE.

Niaki

Ulutan

Mears

. Wavelet based sensor fusion for tool condition monitoring of hard to machine materials. In: Proceedings of the 2015 IEEE international conference on multisensor fusion and integration for intelligent systems (MFI), San Diego, CA, 14–16 September 2015, pp.271–276. New York: IEEE.

Drouillet

Karandikar

Nath

, et al. Tool life predictions in milling using spindle power with the neural network technique. J Manuf Process 2016; 22: 161–168.

Corne

Nath

El Mansori

, et al. Study of spindle power data with neural network for predicting real-time tool wear/breakage during inconel drilling. J Manuf Syst 2017; 43: 287–295.

Kong

Chen

Gaussian process regression for tool wear prediction. Mech Syst Signal Process 2018; 104: 556–574.

Shi

Panoutsos

Luo

, et al. Using multiple-feature-spaces-based deep learning for tool condition monitoring in ultraprecision manufacturing. IEEE Trans Ind Electron 2019; 66(5): 3794–3803.

10.

Duro

Padget

Bowen

, et al. Multi-sensor data fusion framework for CNC machining monitoring. Mech Syst Signal Process 2016; 66–67: 505–520.

11.

Wang

Xie

Zhao

, et al. A new probabilistic kernel factor analysis for multisensory data fusion: application to tool condition monitoring. IEEE Trans Instrum Meas 2016; 65(11): 2527–2537.

12.

Shao

Jiang

Zhang

, et al. Electric locomotive bearing fault diagnosis using a novel convolutional deep belief network. IEEE Trans Ind Electron 2018; 65(3): 2727–2736.

13.

Tobon-Mejia

Medjaher

Zerhouni

CNC machine tool’s wear diagnostic and prognostic by using dynamic Bayesian networks. Mech Syst Signal Process 2012; 28: 167–182.

14.

Ordóñez

Lasheras

Roca-Pardiñas

, et al. A hybrid ARIMA–SVM model for the study of the remaining useful life of aircraft engines. J Comput Appl Math 2019; 346: 184–191.

15.

Ali

Chebel-Morello

Saidi

, et al. Accurate bearing remaining useful life prediction based on Weibull distribution and artificial neural network. Mech Syst Signal Process 2015; 56–57: 150–172.

16.

Khelif

Chebel-Morello

Malinowski

, et al. Direct remaining useful life estimation based on support vector regression. IEEE Trans Ind Electron 2017; 64(3): 2276–2285.

17.

Zhu

Liu

Online tool wear monitoring via hidden semi-Markov model with dependent durations. IEEE Trans Ind Inform 2018; 14(1): 69–78.

18.

Goodfellow

Bengio

Courville

Deep learning. Cambridge, MA: MIT Press, 2016.

19.

Wang

Lei

Yan

, et al. Recurrent convolutional neural network: a new framework for remaining useful life prediction of machinery. Neurocomputing 2020; 379: 117–129.

20.

Guo

Lei

, et al. Machinery health indicator construction based on convolutional neural networks considering trend burr. Neurocomputing 2018; 292: 142–150.

21.

Kaya

Oysu

Ertunc

HM.

Force-torque based on-line tool wear estimation system for CNC milling of Inconel 718 using neural networks. Adv Eng Softw 2011; 42(3): 76–84.

22.

Heimes

. Recurrent neural networks for remaining useful life estimation. In: Proceedings of the 2018 international conference on prognostics and health management, Denver, CO, 6–9 October 2018, pp.1–6. New York: IEEE.

23.

Cho

van Merriënboer

Gulcehre

, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation, 2014, https://aclanthology.org/D14-1179.pdf

24.

Hochreiter

Schmidhuber

Long short-term memory. Neur Comput 1997; 9(8): 1735–1780.

25.

Yang

Guo

Chen

, et al. RNN-Stega: linguistic steganography based on recurrent neural networks. IEEE Trans Inform Foren Secur 2018; 14(5): 1280–1295.

Method for predicting cutter remaining life based on multi-scale cyclic convolutional network

Abstract

Keywords

Highlights

Introduction

Architecture of MSRCNN model

Cyclic convolutional layer

Multi-scale and one-dimensional sensing data

The overall layout of MSRCNN

Experimental setup and data processing

Experimental setup

Data description

Data preprocessing

Evaluation index

Results and discussion

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References