Abstract
In the process of predicting the remaining cutter life, the deep-learning method such as convolutional neural network does not consider the time correlation of different degradation states, which directly affects the accuracy of the remaining cutter life prediction. To extract the features with time-series information to predict the remaining cutter life more effectively, this article proposes a new deep neural network, which is named the multi-scale cyclic convolutional neural network. In the multi-scale cyclic convolutional neural network, a multi-scale cyclic convolutional layer is constructed to memorize the degradation state at different moments and to mine the timing characteristics of multiple sensor data. Multi-scale features are extracted through multi-scale convolution, and the convergence of parameters is improved by layer-by-layer training and fine-tuning. Finally, the remaining cutter life is predicted based on the features. The comparison with the published prediction methods of convolutional neural network and recurrent neural network models proves that our method (multi-scale cyclic convolutional neural network) is superior in improving the precision and accuracy of remaining cutter life prediction. This method breaks through the limitations of the convolutional neural network prediction model in this field and provides a theoretical basis for evaluating the remaining service life of the cutter.
Highlights
For the sensor noise interference problem, a large-scale convolutional kernel is used to filter signal noise.
A multi-scale convolutional kernel is used to extract multi-scale features and maintain global and local features to improve network capacity and model feature learning capability.
A multi-scale cyclic convolutional layer is constructed to store degradation state information, and this layer is used to mine the time-series features of the original data to improve the remaining-life-prediction accuracy of the model.
Introduction
In advanced manufacturing systems, the high performance of a machine tool is the key to producing high-quality machined surfaces, and the main cause of cutter failure is cutter contact tip wear. To ensure machining accuracy within the cutter life cycle, the industry generally adopts excessive protection strategies, resulting in additional machining cost. Therefore, if the remaining cutter life (RUL) of the cutter can be accurately estimated, which can be fully utilized and reduces the purchase of cost. Moreover, the workload can also be greatly reduced. However, due to the intermittent contact between the cutter and the artifacts, it is a challenge to capture the dynamic characteristics of the cutter wear mechanism, which severely restricts the efficiency and accuracy of cutter RUL prediction. To develop an effective RUL prediction system, scholars have carried out more and more researches. At present, RUL prediction methods are mainly divided into two types, namely methods based on physical models of RUL prediction and methods based on data-driven of RUL prediction.1,2
The method based on the physical model is to use the wear failure mechanism of the tool to mathematically model the entire wear degradation process of the tool, and then use the empirical formula to optimize the parameters of the mathematical model, and finally predict the RUL. However, due to the complex failure mechanisms of different tools, it is difficult to establish an accurate mathematical model in practical applications. On the contrary, the data-driven method only needs to input the data characteristics, and does not require the empirical formula of the physical model and the complex failure mechanism. Therefore, in recent years, data-driven RUL prediction methods have become more and more popular. Benkedouh et al. proposed RUL based on support vector regression to predict the tool. First, extract feature vectors from the vibration signal, force signal, and acoustic emission signal provided by the 2010 Predictive Health Management (PHM) data set, and then regression predicts the RUL of the tool, but the result shows that the error is large. Wu et al. 3 proposed a tool remaining-life-prediction method based on random forest model, which extracts 28 feature vectors from cutting force signals, vibration signals, and acoustic emission signals and uses these 28 feature vectors to train random forest. The model is used to predict the tool wear value, and the experiment shows that it has good predictive performance. Drouillet et al. used artificial neural networks (ANNs) to predict the RUL of the cutter using the motor spindle power. Yan and Lee 4 proposed a logistic regression model to predict the RUL of the drill bit, and predict the RUL of the drill bit by establishing the relationship between the vibration signal and the wear value.
Niaki et al. 5 used wavelet analysis to extract the time-domain and frequency-domain features of multi-sensor signals for recurrent neural network (RNN) model tool wear prediction; improved RNN model through the application of sensor information fusion in tool wear estimation. Research shows that its generalization performance can be up to 13%. Drouillet et al. 6 studied the relationship between RUL and machine tool spindle power to predict RUL and found that the error range between the predicted RUL of the tool and the real RUL is very small, which proves that the machine tool spindle power value is a very effective feature vector. Corne et al. 7 used the neural network to input the spindle power signal and vibration force signal data for processing. The study showed that the use of power signal and vibration force signal data to predict tool flank wear value error is about 0.4%–18.4%. Kong et al. 8 studied tool wear based on the kernel principal component analysis method based on integral radial basis function and Gaussian process regression (GPR). The study showed that the kernel principal component analysis has smoothness and GPR’s confidence interval range; at the same time, it is better than neural network and support vector machine in improving the accuracy of tool wear prediction. But the disadvantage is that these models largely depend on the sensitivity of the extracted features, which is usually realized through expert knowledge.9–12 Tobon-Mejia et al.’s 13 study is based on the dynamic Bayesian network (DBN) model to predict and identify the remaining service life of the tool. Kaya uses all neural networks to verify the reliability of the model in the prediction of milling tool wear. Therefore, machine learning is used by many scholars to extract tool degradation characteristics to predict its remaining life.14–17
As one of the data-driven methods, deep-learning methods 18 can automatically extract features based on raw sensor data and build corresponding prediction models. 19 Among deep-learning techniques, convolutional neural network (CNN) 20 has received special attention in tool RUL prediction because of its huge advantages in processing time-series signals. To take advantage of the powerful feature extraction capabilities of CNN, Babu et al. used CNN for the first time to solve the cutter RUL prediction problem, and through experiments proved that this method is significantly better than multilayer perceptron and support vector regression method. Other researchers have also studied RUL associated with various signals by adopting deep-learning methods such as CNN and RNN in recent years.14,21,22 Although deep-learning models have great advantages in automatically extracting the features, these models cannot obtain feedback information and memory information from time-series data. The two information features in the sensor data divided by various noise signals with a long duration may lead to prediction failures. During operation, with the passage of time, the tool degenerates from a normal wear state to a completely blunt failure state, which is a gradual degradation process over time. Correspondingly, the degradation state of the cutter at different moments is related to the time scale. However, the existing research ignores this dependence in the network construction process, which affects the accuracy of the prediction model and limits its promotion. Scholars have proposed many methods to reduce the dimensionality of the original data to improve the accuracy of the remaining-life-prediction model. However, they rarely mentioned the influence of the time-series information existing among the data on the prediction accuracy. Therefore, the establishment of correlation models of different degradation states is very important to predict the RUL of the cutter accurately.
To effectively use the timing information which is hidden in the signal, this article proposes a new deep-learning method, namely the multi-scale cyclic convolutional neural network (MSRCNN) to predict the RUL. The basic principle of MSRCNN is first, by constructing a new multi-scale cyclic convolutional network layer to memorize the timing of different degradation states and to mine the timing characteristics of the original data. Then, multi-scale features are extracted through multi-scale convolution; global and local features are retained; and network parameters are optimized by layer-by-layer training and fine-tuning. Finally, the remaining service life is predicted based on the characteristics.
The main contributions of this article are as follows: (1) in response to the problems of sensor noise interference, a large-scale convolution kernel is used to filter out signal noise; (2) a multi-scale convolution kernel is adopted to extract multi-scale features, maintaining global and local features, and improving the network capacity and model feature–learning ability; (3) construct a multi-scale cyclic convolution layer to memorize the information of the degraded state, which is adopted to mine the time-series characteristics of the original data and improve the remaining-life-prediction accuracy of the model.
Architecture of MSRCNN model
Cyclic convolutional layer
As the convolutional layer of the core component of CNN, it does not require manual intervention and can extract useful features from the input data. However, there is no recurrent layer in the convolutional layer, which means that the signal only flows forward in the CNN, and the output cannot be fed back to the input. Correspondingly, only the current input information in each time step is considered and the previous degradation information is ignored in CNN. In particular, the existing prediction methods based on CNN cutter RUL cannot solve this issue and leads to reduce their prediction accuracy and generalization ability. Therefore, in this article, a new cyclic convolutional layer is constructed to solve this problem and improve the prediction performance of the agorithm. Different from the convolutional layer, a cyclic connection is added between the output and the input in the cyclic convolutional layer, so that the information is transmitted cyclically instead of one direction. In the cyclic convolutional layer, the output information is fed back to the input through the cyclic connection, and the degradation of information over time is memorized. Therefore, the output of the cyclic convolutional layer depends on the current input state information and the previous state information in the past input memory. Through this dynamic tuning, the time sequence characteristics of the input data can be fully mined and the temporal correlation model of different degradation states can be established in the cyclic convolutional layer.
In theory, the cyclic connection enables the cyclic convolutional layer to feed back output information from the input sensor data to the input, forming a cyclic information flow, rather than a one-way flow. However, in practical applications, the convolutional layer often encounters the problem of gradient disappearance during the training iteration process. In order to reduce the effect of the disappearance of the gradient and capture long-term correlations, a gated selection mechanism is introduced in the recurrent convolutional layer, 23 whereas the gated selection mechanism does not exist in long short-term memory (LSTM) networks.24,25 By introducing a selective mechanism, the recurrent convolutional layer is able to appropriately forget or emphasize information from previous moments as well as the current moment. On one hand, the reset gate is able to determine the extent to which past information is forgotten, which will effectively allow the network to forget some previously irrelevant information. On the other hand, the update gate controls the amount of information passed from the previous state to the current state, which helps the network to remember long-term information and eliminate the problem of gradient disappearance. Thus it is able to capture the dependencies on different time scales adaptively.
As shown in Figure 1,
where
where

Cyclic convolution layer gate control mechanism.
Multi-scale and one-dimensional sensing data
The input data of MSRCNN is various parameters collected by multiple sensors. To comprehensively utilize all the sensor data, this article uses the sliding window strategy to construct multi-channel one-dimensional sensor data. The process can be expressed as follows
among them,

Multi-scale one-dimensional sensing data generation process.
Therefore, in the deep prediction network, if a single convolution kernel is used to automatically extract feature information, it will cause the prediction accuracy of the model to decrease, because the degradation information will be lost in the learning process. In order to avoid this problem, this article proposes a multi-scale learning strategy. As shown in Figure 2, three convolution kernels with inconsistent sizes are arranged in parallel in the multi-scale learning strategy, namely 1 ×
The overall layout of MSRCNN
The architecture of MSRCNN proposed in this article is shown in Figure 3. The proposed MSRCNN includes structures including multi-scale cyclic convolutional layer (MSRCL), convolutional pooling layer (PL) and convolutional fully connected layer (FCL). In order to comprehensively use multiple sensors to monitor the measured data information, this article uses a multi-scale learning strategy to integrate multiple sensor data information as the input of the multi-scale cyclic convolutional network. Then, createing

Architecture of multi-scale cyclic convolution network.
Experimental setup and data processing
Experimental setup
The experimental platform of the CNC milling machine and the installation positions of different types of sensors are shown in Figure 4. The workpiece is cut and the material is removed from the raw material, the original skin layer material with rough particles is removed by face milling, and then the workpiece is milled. A Kistler9265B three-way dynamometer is installed between the workpiece and the processing test bench to measure the cutting force in the form of electric charge and convert it into a voltage signal for storage through the Kistler5019A charge amplifier. Three Kistler piezoelectric accelerometers are installed on the test bench to measure the vibration of the machine tool in the

Positioning of different types of sensors on the CNC milling machine test bench.
Data description
In order to prove the effectiveness of the method proposed in this article in the prediction of the remaining service life of the tool, this section applies MSRCNN to the experimental data of milling cutters for RUL prediction. The experimental data comes from the New York Society for Forecasting and Health Management (New York Society for Forecasting and Health Management). PHM shares the data for the 2010 high-speed CNC machine cutter health prediction competition. The experimental conditions are shown in Table 1.
Main equipment and test conditions for milling test.
Data preprocessing
To unify the data range, all sub-data sets in the data are standardized. The standardization formula is as follows
where
Evaluation index
To evaluate the performance of the proposed method in RUL prediction, this article uses four evaluation indicators as follows: (1) explained variance score (EVS), explained the variance score of the regression model. Its value range is [0,1]. The closer it is to 1, the more the independent variable can explain the variance of the dependent variable. The smaller the value, the worse the effect. (2) Mean absolute error (MAE), used to evaluate how close the predicted result is to the real data set. The smaller the value, the better the fitting effect. (3) Mean squared error (MSE), this indicator calculates the mean value of the sum of squared errors between the fitting data and the corresponding sample points of the original data. The smaller the value, the better the fitting effect. (4)
Results and discussion
In the tool RUL prediction, this article performs five cross-validation on the training data set used, and finally determines the network structure parameters of MSRCNN. The structure parameters include the number of convolution kernels
MSRCNN configuration in milling cutter RUL prediction.
First, MSRCNN uses multi-scale convolution to extract multi-scale features, and then circulates the convolution layer to simulate the time correlation of different degradation states, mines the timing characteristics of the data, and uses layer-by-layer training and fine-tuning to improve the convergence of parameters. The loss function of the MSRCNN training process decreases as shown in Figure 5. Set 50 as the number of iterations. It can be seen that after 50 iterations, the training error approaches 0, reaching convergence.

Training iteration process.
Based on the constructed convolutional layer, this article evaluates the influence of the number of neurons in the MSRCNN layer on the model evaluation index, as shown in Figure 6. This article increases the network depth of MSRCNN by increasing the number of cyclic convolutional layers and PLs. It can be seen that as the network depth increases, the higher the EVS and

Influence of network depth on evaluation index. (a) values of EVS at different network depths, (b) values of MAE at different network depths, (c) values of R2 at different network depths, and (d) values of MSE at different network depths.
To analyze its impact more quantitatively, this article uses five different network layers to predict the milling cutter data, and uses the above four evaluation indicators to evaluate the network layers to evaluate the RUL prediction model, as shown in Table 3.
Influence of network depth on MSRCNN prediction performance.
It can be seen that as the network depth increases, the EVS and R2 values are closer to 1, and the MAE and MSE values are getting closer and closer to 0. However, when the network depth is small (when it is 2), its evaluation index is poor, which may be due to its insufficient fitting ability. When the network depth is deeper (6 o’clock), this will result in a heavier calculation burden and increase the cost of calculation time. Unfortunately, the accuracy rate has not been improved. That may be due to the fact that excessive network depth leads to accuracy saturation and even over-fitting. The above analysis of the results shows that it is particularly important to select the appropriate number of neurons for the prediction model. Therefore, based on the above data analysis, this article sets the number of cyclic convolutional layers
The MSRCNN model is mainly composed of the CNN and the RNN. To verify the effectiveness of the model, the model is compared with the CNN model and the RNN model. Taking the two sets of cutter data of milling cutter 1 and milling cutter 2 as the test set, the above three models are used to predict their remaining service life, respectively. The prediction results are shown in Figure 7, wherethe

Prediction results of milling cutter RUL with different (a) comparison between CNN model and MSRCNN model (Cutter1); (b) comparison between RNN model and MSRCNN model (Cutter1); (c) comparison between CNN model and MSRCNN model (Cutter2); and (d) comparison between RNN model and MSRCNN model (Cutter2).
Figure 7 shows that the goodness of fit of the MSRCNN model is best compared to the other two models and are closest to the real RUL.
Table 4 shows that the MSRCNN model proposed in this article is better than CNN and RNN in the evaluation indicators EVS, MAE,
Prediction error of different models.
CNN: convolutional neural network; RNN: recurrent neural network; MSRCNN: multi-scale cyclic convolutional neural network; EVS: explained variance score; MAE: mean absolute error; MSE: mean squared error.
Conclusion
In this article, a new prediction framework MSRCNN for RUL prediction of cutting cutters is proposed. The proposed MSRCNN takes the time-series data collected by different sensors as input and constructs a cyclic convolution layer to simulate the degraded state of the cutter to mine the time-series characteristics of the data. Then, multi-scale features are extracted through multi-scale convolution, and the parameters are optimized by layer-by-layer training and fine-tuning. By periodically superimposing multiple cyclic convolutional layers and maximum PLs, feature information is automatically extracted from the input data. Finally, by inputting these learned features into the subsequent FCL to estimate RUL, CNN and RNN prediction methods are compared. Experimental results show that compared between the existing CNN-based prediction models and RNN-based prediction models, the proposed MSRCNN has obvious advantages in accuracy and convergence. We can conclude that our proposed method overcomes the limitations of the CNN prediction model in the evaluation of the remaining service life of the cutter.
Footnotes
Handling Editor: Francesc Pozo
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported in part by the East China Jiao Tong University Fundamental Research Funds (grant no. 2003419018) and by the National Natural Science Foundation of China (grant no. 52067006).
