Abstract
Cutting tool wear prediction plays an important role in the machining of complex aerospace parts, and it is still a challenge under varying cutting conditions. To overcome the limitations of the existing methods in generalization ability when dealing with cutting conditions changing largely, this paper proposed a novel cutting tool wear prediction method based on continual learning. A meta-LSTM model is firstly trained for specific cutting conditions and can be easily fine-tuned with very small number of samples to adapt to new cutting conditions. Specifically, the meta-model could be continuously updated as machining data increase by using an orthogonal weights modification method. The experiment results show that the proposed method can realize accurate prediction of tool wear under different cutting conditions. Compared with existing methods including meta-learning methods, the range of adapted cutting conditions could be expanded as the task distribution of new cutting conditions is continuously learned by the prediction model.
Introduction
With the introduction of “Industry 4.0” in Germany, the fourth industrial revolution led by intelligent manufacturing has been launched globally. As a key technology in the manufacturing field, Numerical Control (NC) machining has more and more requirements in automation and intelligence. Online prediction of cutting tool wear is an important part to realize the automation and intelligence in machining process. Especially in production lines, in order to ensure production continuity to improve production efficiency and ensure processing quality, it requires higher level of automation and intelligence, and the online prediction of cutting tool wear is much more urgent.
With the development in aerospace industry, difficult-to-cut materials such as titanium alloys and high-temperature alloys have been widely used in aerospace complex parts. During the machining process of the parts, cutting tool wear is more serious, 1 and the cutting tool may even be broken, which may significantly impact surface texture or machining precision. 2 In practice, cutting tools are changed more often than necessary because it is difficult to accurately predict tool wear. For example, over 40 cutting tools are needed to complete the milling of nickel-based superalloy part 3 and only 50%–80% of cutting tool life is rationally used. 4 On the other hand, the new generation aerospace products have higher requirements in machining accuracy and surface quality to satisfy the high performance, accompanied with higher requirements of tool wear prediction accuracy. For example, considering the tool wear rate is 0.009 mm/min under a certain cutting condition, if the prediction accuracy is 0.09 mm and it is equivalent to the tool wear value with 10 min cutting process, while it is under high risk for the cutting tool to be up to the wear limitation during the process, as it always takes about 10 min to process a machining feature. Under some cutting conditions with more rapid tool wear or with higher surface quality, higher prediction accuracy is required. Furthermore, in some complex cutting conditions such as corner milling during cutting large–sized parts, tool wear is more difficult to predict and unexpected over worn or broken situations may occur during the cutting process, which may cause part failure. As reported, during the machining of a structural part with the material of titanium alloy in an aerospace manufacturing enterprise, severe tool wear caused the workpiece ablated and scrapped due to non-prediction of tool state in time. Therefore, accurate online prediction of cutting tool wear is crucial for ensuring machining quality and reducing cost during machining process, especially for large-sized and difficult-to-cut materials used in airplanes.
This paper proposed a novel method for accurately predicting cutting tool wear under different cutting conditions based on continual learning for cutting in NC machining. Cutting tool wear under a specific cutting condition could be predicted by Long Short Term Memory (LSTM) as a base-model. Model-Agnostic Meta-Learning (MAML) 5 is used to update the meta-model parameters of the LSTM to adapt different cutting conditions. After the meta-LSTM model is successfully trained, the model can be easily fine-tuned with very small number of samples so as to adapt to new cutting conditions. Specifically, the meta-model, incorporating efficient and scalable continual learning, could be continuously updated by new cutting conditions during the machining process, that is, learning different tasks sequentially, one at a time. The meta-parameters are fine-tuned by orthogonal weights modification method with small samples in new cutting conditions. Compared with existing meta-learning methods, the range of adapted cutting conditions could be expanded as the task distribution of new cutting conditions is continuously learned by the prediction model.
Related works
There have been plenty of reported studies on cutting tool wear prediction. Previous research aimed to predict tool wear using tool wear mechanism modeling methods.6–12 For example, Rech et al. 10 proposed a tribological approach to predict cutting tool wear. It identified a fundamental wear model using a dedicated tribometer, and it is able to simulate relevant tribological conditions encountered along the tool–work-material interface. However, tool wear during machining is a complex physical and chemical process, while traditional model can only consider some specific processes such as friction and deformation. During the considered specific processes, tool wear is influenced by many factors, and traditional models can only consider some specific factors, such as cutting temperature, cutting force, material of cutting tool and so on, even in this situation, the tool wear process can only be modeled by significant simplification and assumptions, while the solution searching procedure is still complex, and can only be obtained by approximation. So traditional prediction methods are not accurate, and can only predict tool wear stages in the whole tool life.
Due to the complexity of tool wear during machining process, the establishment of tool wear mechanism models is more and more challenging, while data-driven methods can learn data-driven models from a large volume of data, and the data-driven model can be equivalent to complex mechanism models within certain range of error, so data-driven method provides a new idea for accurate tool wear prediction.13–18
Some existing data driven methods such as deep learning have been used to predict cutting tool wear.19–21 For example, Zhao et al. 20 proposed a deep learning method for tool wear prediction. The Convolutional Bi-Directional LSTM Networks were established and the convolutional neural network was used for feature extraction while the LSTM network was used for tool wear monitoring. Shi et al. 21 established two deep auto-encoder networks, one for signal feature extraction and another one for tool wear prediction in the given cutting condition. Deep learning has limitations in varying cutting conditions, because it needs to be trained on different cutting conditions with a large number of labeled samples which have to include monitoring signals and corresponding tool wear label. Considering the multi-dimensional input space and the complex structure of deep network for tool wear prediction, the training samples needed by deep learning may be 10,000 or 100,000. However, in actual machining area, labelled samples are difficult to obtain. The experiments for obtaining samples are time consuming and costly, each label of the cutting tool wear should be measured with a series of sophisticated operations by interrupting machining process. So it is impossible to train a perfect deep learning model for predicting tool wear under all cutting conditions.
In order to overcome the limitations of large number of labelled samples needed by deep learning, a meta-learning method for tool wear prediction was proposed by the authors Li et al. 22 Different base-models are trained over specific tasks, as each base-model for a specific cutting condition should not be very complex, and the sample requirement is not large, so each base-model is easy to train. A meta-learning model can be trained during the training process of base-models, where the meta-learning model can learn the natural law of the change of base-models, and the meta-learning model can be easily adjusted so as to adapt to new cutting conditions. In this case, the required training samples are significantly reduced compared with deep learning which tries to train a complex model.
However, if the cutting conditions change largely (e.g. the cutting parameters or diameter of cutting tools are quite different), the prediction accuracy of meta-learning may decrease, because the generalization ability is limited by the task distribution of base-models for different cutting conditions. 23 This point should be developed further more.
Cutting tool wear prediction method based on continual learning
Approach overview
The overall idea of the proposed tool wear prediction method is shown in Figure 1. A base-model is a LSTM to predict tool wear under a specific cutting condition by taking advantages of the time series of Recurrent Neural Network (RNN), where the cutting tool wear rule can be implicitly learned. The inputs of LSTM are signal features of vibration, power and current, which are preprocessed by entropy weight-grey correlation analysis 22 and manifold learning. MAML 5 is used to update the meta-model parameters of the LSTM to adapt different cutting conditions. After the meta-LSTM model is successfully trained, the model can be easily fine-tuned with very small number of samples so as to adapt to new cutting conditions. Specifically, the meta-model, incorporating efficient and scalable continual learning, could be continuously updated by new cutting conditions during the machining process, that is, learning different tasks sequentially, one at a time. The meta-parameters are fine-tuned by orthogonal weights modification method with small samples in new cutting conditions. Compared with existing meta-learning methods, the range of adapted cutting conditions could be expanded as the task distribution of new cutting conditions is continuously learned by the prediction model. Such an ability is crucial to humans as well as artificial intelligence agents for two reasons: (1) there are too many possible cutting conditions to learn concurrently, and (2) useful mappings cannot be pre-determined but should be learned when corresponding cutting conditions are encountered.

The overall idea of the proposed tool wear prediction method.
Continual learning for cutting tool wear prediction based on orthogonal weights modification method
The main idea of continual learning is learning different tasks sequentially, one at a time. So the continual learning method could update the meta-model parameters by new cutting conditions, while the meta-learning model is constant after successfully trained in meta-learning method. Compared with existing meta-learning methods, the range of adapted cutting conditions could be expanded as the task distribution of new cutting conditions is continuously learned by the prediction model. Such an ability is crucial to humans as well as artificial intelligence agents.
The main obstacle to achieve continual learning is that conventional neural network models suffer from catastrophic forgetting, that is, training a model with new tasks interferes with previously learned knowledge and leads to significant decreases in the performance of previously learned tasks.24,25 To avoid catastrophic forgetting, the orthogonal weights modification method 26 is used for continual learning. Specifically, when the tool wear prediction model is adjusted for new cutting conditions, the feature space of its parameters can only be modified in the direction orthogonal to the subspace spanned by all previously learned inputs, as shown in Figure 2. This ensures that new learning processes do not interfere with previously learned tasks, as model parameter changes in the network as a whole do not interact with old inputs. Consequently, combined with a gradient descent-based search, the orthogonal weights modification method helps the prediction model to find a weight configuration that can accomplish new tasks while ensuring the performance of learned tasks remains unchanged. This is achieved by constructing a projector used to find the direction orthogonal to the input space, represented as formula (1):

The representation of the orthogonal weights modification method in tool wear prediction model.
where matrix
In the (i+1)th task, the tool wear prediction model is adjusted by orthogonal weights modification method according to formula (2):
The input space is orthogonal to the projector P, so the second term of the right-hand formula is equal to 0. This means that new learning processes do not interfere with previously learned tasks.
To calculate P in formula (1), an iterative method can be used. Specifically, consider a neural network of L+1 layers, indexed by l=0, 1, 2, …, L with l = 0 and l = L being the input layer and output layer, respectively. All hidden layers share the same activation function g.
In the orthogonal weights modification method, the orthogonal projector P l defined in the input space of layer l for learned tasks (cutting conditions that the model has encountered) is key for overcoming catastrophic interference in continual learning. In practice, P l can be recursively updated for each task in a way similar to calculating the correlation-inverse matrix in the recursive least square (RLS) algorithm,27,28 as shown in formula (3):
This method allows P l to be determined based on the current inputs and the P l for the last task. It also avoids matrix-inverse operation in the original definition of P l . The detailed procedure for the implementation of the orthogonal weights modification method is shown by Algorithm 1.
The algorithm does not need to store all previous inputs A. Instead, only the current inputs and projector for the last task are needed. This iterative method is related to the RLS algorithm, which can be used to train feed-forward and RNN to achieve fast convergence, tame chaotic activities 29 and avoid interference between consecutively loaded patterns or tasks. 30
In addition, the capacity of the orthogonal weights modification method is analyzed, i.e., how many different cutting conditions could be learned and adapted to using this method. The capacity of one network layer can be measured by the rank of P(i), which is defined as the orthogonal projector calculated after cutting condition i, with ΔP(i+ 1) then defined as the update in the next cutting condition satisfying P(i+ 1) = P(i) –ΔP(i+ 1). As range
where
Meta-learning modeling
The continual learning method could update the parameters of tool wear prediction model by new cutting conditions during the machining process. To achieve a good performance of continual learning, the initial parameters of the prediction model are obtained by meta-learning.
The effects of tool wear and cutting condition on the monitoring signals have coupling effect, resulting in the difficulty of accurate prediction of tool wear when the cutting condition changes. Therefore, the tool wear prediction model should have certain adaptability to changing cutting conditions.
The meta-learning algorithm considers the distribution on model f and task P(T). The algorithm tries to find the ideal parameter

The meta-learning mechanism.
So, the mechanism for learning to learn (meta-learning) is applied to quickly adapt new tasks, after trained by different tasks. The key idea of meta-learning is to train a model’s parameters during a meta-learning phase on a set of tasks such that a few gradient steps, or even one single gradient step, which can produce good results on new tasks. It can be viewed as establishing a general representation broadly adaptable to different tasks.
A base-model in meta-learning is a tool wear prediction model under a specific cutting condition. The architecture of base-model directly impacts the accuracy of tool wear prediction.
Tool wear is a process that changes over time. The current tool wear is related to the previous wear state during a period of time. For this reason, the monitoring signal features of the prior tool wear state will be used for predicting the current tool wear.
RNN is a sequential neural network model in which connections between units form a directed loop. RNN is usually used for data related to sequential tasks. It takes the input sequence as an element from the input layer at a time, maintains a “state vector” in the loop layer, and implicitly contains historical information about all past elements of the sequence. The history information is passed to the current output layer or the next cycle layer. RNN has the powerful ability to capture context information in sequences of different lengths, which is suitable for processing time-series tasks such as tool wear prediction. Tool wear data is auto-correlated and the current tool wear value can be calculated from the current sensor monitoring signals and historical data from the previous time period. In the tool wear prediction model, the order of the data related to the tool wear monitoring signal should be considered, and a model should be constructed using auto-correlation data to describe the dynamic process. Therefore, the tool wear monitoring signal data is “context sensitive” data. Under this circumstance, a class of RNN - Long Short Term Memory (LSTM) is adopted to establish the prediction model by considering the entire tool wear related information. LSTM is a variant of RNN, which is used to solve the gradient vanishing problem of long-sequence RNN model. Using LSTM, the law of tool wear over time can be implicitly learned, and the model parameters can be fine-tuned by collecting labeled samples at the beginning of a new operating condition, which is more suitable for changing conditions, such as corners or arcs, than traditional fully connected neural networks.
As the base-model depicted in Figure 1, LSTM is used to learn abstract representation of input features with time sequence, and then output of LSTM is used as the input of fully connected layers to predict the current tool wear. Therefore, the structure of prediction module consists of LSTM units, and four layers in each prediction process: one feature input layer, the recurrent layer, the hidden layer and the fully connected regression layer. Firstly, the features
Where
The parameters of the overall structure of the model are studied, including the stacking depth of the neural network model, the number of neurons in each layer, and the selection of the activation function. The control variable method is used to quantitatively analyze the impact of different parameters on the overall performance of the base-model. The optimal value of each parameter is selected for the final tool wear prediction model.
The final structure and hyperparameter of the base-model are as follow: the input layer including 20 units, one hidden layer including 20 units, time steps of LSTM is set as 4, the output layer including one unit as the tool wear value.
Case study
In this paper, 13 sets of experiments are designed, of which nine sets are used for the training of the meta-learning model, and nine sets are used to test the tool wear prediction accuracy of the continual learning under changing cutting conditions. Reasonable selection of cutting parameters for machining titanium alloy can ensure that the cutting tool is in continuous wear state. The flank wear of each cutting tool ranged from 0.10 mm to 0.40 mm for each experiment.
In the experiments for obtaining training samples, the cutting tools used were carbide end milling tools with parameters of 12*12*24*90*R1, the cutting parameters change in the range of: spindle speed (n: 1750 r/min–1850 r/min), feed per tooth (fz: 0.045 mm/r–0.055 mm/r), and cutting depth (ap: 2.5 mm–3.5 mm), as shown in Table 1.
Experiment groups and cutting parameters—training set.
In the experiments for model test, the cutting tools with the same materials are used, with change of tool diameters, D16 and D20 are used, the cutting parameters change in the range of: spindle speed (n: 1300 r/min−2200 r/min), feed per tooth (fz: 0.05 mm/r–0.07 mm/r), and cutting depth (ap: 3 mm–8 mm). It can be found that the cutting tool parameter and cutting parameters are changing dramatically compared with training sets, especially for the last training set, as shown in Table 2. Compared with our previous research, 22 the cutting conditions in the test sets change much larger.
Experiment groups and cutting parameters—test set.
As shown in Figure 4, the experiment was performed on a DMG80P machine tool. The part material used in the experiment was titanium alloy TC4, the machining feature is pocket bottom. The vibration, spindle power and current signals were collected during the whole machining process. The vibration sensor is KSI-108M500, and the acquisition frequency is 500 Hz. The spindle power and current signals are collected by the OPCUA of the Siemens CNC system, and the collection frequency is 100 Hz.

The experiment scene and equipment: (a) experiment scene and (b) experiment equipment.
The tool wear label is collected by an XK-T600 V™ microscope (measurement accuracy is calibrated as: 0.01 mm). In this paper, 246 tool wear labels were actually measured, and about 20 labels were collected under each cutting condition. It should be noted that since the change of the value between two adjacent collected labels is very little, no further intensive collection is required. For each tool, because the collection time between two adjacent labels is short and the tool wear change small, the tool wear can be considered as a gradual change process. In order to further increase the number of labeled samples to help model training, Hermite interpolation method was used to interpolate between two adjacent labels. The number of labeled training samples finally obtained is 4500, which can be better used for training the meta-LSTM model. The number of testing samples in each test set is about 250. During the training process of the meta-learning model, the hyperparameter α is set to 0.001, and the Adam algorithm
31
is used for optimization. The convergence curve of the training process is shown in Figure 5. The X-axis represents the training epoch and the Y-axis represents the training loss. It can be found that the training process was converged at 100 epoch and the training loss tend to minimum at 700 epoch. During the continual learning process, the hyperparameter

The convergence curve of the training process.
In order to obtain an effective learning model, a large number of debugging experiments have been carried out in the input space dimension, network structure selection, and the number of base-models, etc. Compared with the LSTM model, deep LSTM model and meta-LSTM model, the proposed method improves the tool wear prediction accuracy in different cutting conditions. Specifically in the group 13, in which the task distribution of the cutting condition is much more different from others, the proposed method has greater adaptation than other methods. Table 3 shows the comparison results of different prediction models.
Comparison of different prediction models.
Conclusion
This paper proposed a novel method for accurately predicting cutting tool wear in different cutting conditions based on continual learning. A meta-LSTM model is trained for specific cutting conditions and can be easily fine-tuned with very small number of samples to adapt to the new cutting condition. Specifically, the meta-model could be continuously updated during the machining process by learning different tasks sequentially. The meta-parameters are fine-tuned by orthogonal weights modification method with small samples in new cutting conditions.
The experiment results show that the method proposed in this paper can realize the real-time accurate prediction of tool wear under different cutting conditions, and the final prediction error of tool wear is controlled within 0.045 mm. Even in the situation that cutting condition changes largely, the prediction error is 0.085 mm, much smaller than other methods. Compared with existing meta-learning method, the range of adapted cutting conditions could be expanded as the task distribution of new cutting conditions is continuously learned by the prediction model.
The signals collecting method and tool wear labels measurement method are very feasible in industrial circumstance, and smaller labelled samples are required for new cutting conditions, the industrial applicability of the proposed method in this paper will be very good. In the future work, the continual learning model should be further studied for different cutting tool material and different part material.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The reported research was funded by National Science Fund of China for Distinguished Young Scholars (grant No. 51925505), National Natural Science Foundation of China (grant No. 51775278, grant No. 51921003).
