Sage Journals: Discover world-class research

Abstract

Workpiece quality prediction is very important in modern manufacturing industry. However, traditional machine learning methods are very sensitive to their hyperparameters, making the tuning of the machine learning methods essential to improve the prediction performance. Hyperparameter optimization (HPO) approaches are applied attempting to tune hyperparameters, such as grid search and random search. However, the hyperparameters space for workpiece quality prediction model is high dimension and it consists with continuous, combinational and conditional types of hyperparameters, which is difficult to be tuned. In this article, a new automatic machine learning based HPO, named adaptive Tree Pazen Estimator (ATPE), is proposed for workpiece quality prediction in high dimension. In the proposed method, it can iteratively search the best combination of hyperparameters in the automatic way. During the warm-up process for ATPE, it can adaptively adjust the hyperparameter interval to guide the search. The proposed ATPE is tested on sparse stack autoencoder based MNIST and XGBoost based WorkpieceQuality dataset, and the results show that ATPE provides the state-of-the-art performances in high-dimensional space and can search the hyperparameters in reasonable range by comparing with Tree Pazen Estimator, annealing, and random search, showing its potential in the field of workpiece quality prediction.

Keywords

Automatic machine learning workpiece quality prediction high dimension hyperparameter optimization

Introduction

Workpiece quality prediction is very important in manufacturing industry since defects not only have negative impacts on the products quality but also could reduce the sales volume and even cause irreparable losses to the enterprises.¹ With the development of smart manufacturing, the automatic prediction of workpiece quality has been vital, and many machine learning (ML) methods have been applied effectively on the prediction of workpiece quality.²

However, though most ML methods have been successfully applied in manufacturing industry, their performance heavily relies on the hyperparameters.³ Since default hyperparameters cannot guarantee the performance of ML models,⁴ tuning the hyperparameters becomes the essential process for ML methods. Various tuning approaches, such as trial and error, manual search, are developed to obtain the best configuration of hyperparameters. But they still face the following barriers: (1) Tuning hyperparameters is mostly dependent on experts’ experience and tedious episodes of trial and error, making it very time-consuming and labor-intensive. (2) The tuning process has to be repeated when applied to a new dataset; it is hard to decide the tuning range of the hyperparameters. (3) The combinations of hyperparameters are innumerable in high dimension, which is hard to find its best combination. So the tuning process is very time-consumption and labor-intensive, and the results of tuning process is easy to converge to the suboptimal hyperparameters configuration.

To overcome above drawbacks, some approaches are investigated to tune the hyperparameters in the automatic way, and they are denoted as the hyperparameter optimization (HPO). The most prominent HPO approaches are grid search, random search, and Bayesian optimization, and HPO has been applied into many fields. Shi et al.⁵ applied grid search for few hyperparameters tuning in order to improve accuracy of monitoring tilt angle. Hao et al.⁶ used Random Search to optimize serval control hyperparameters for efficiency. McParland et al. optimized hyperparameters for improvement of prediction of tool wear rates by Bayesian optimization.⁷ Compared with trial and error and manual search, HPO methods are easy to use and also can achieve the state-of-the-art results in some tasks. HPO methods are effective and efficient, but they also need to be further improved when the hyperparameter spaces are complex, such as the hyperparameters space for workpiece quality prediction model is high dimension with continuous, combinational, and conditional types of hyperparameters, which is difficult to be tuned.

In this research, a new automatic machine learning (AutoML) based HPO is proposed for workpiece quality prediction, named adaptive Tree Pazen Estimator (ATPE). First, it models the tuning process to the sequential model-based optimization (SMBO), and it iteratively searches the promoting hyperparameter combination in the automatic way. Second, ATPE updates the width of search interval based on the historical information of HPO, and improves the warm-up process of Tree Pazen Estimator (TPE).

The main contribution of this article is proposing an adaptive warm-up process for TPE, named ATPE, which can automatically tune hyperparameters for workpiece quality prediction in high dimension. The proposed ATPE is tested on two datasets, MNIST and workpiece quality prediction dataset. The results show that ATPE can provide the state-of-the-art performance for HPO in high dimension by comparing with RS, annealing, and TPE.

The rest of this article is organized as follows. Section “Literature review” discusses the literature review. The next section introduces the “HPO based on AutoML.” Section “The proposed ATPE for HPO in high dimension” gives the methodologies of the proposed ATPE. Section “Case studies and results” shows the experimental results of ATPE on two datasets. Section “Conclusion and future researches” presents the conclusion and future researches.

Literature review

This section introduces the literature review about ML applications on quality prediction and AutoML based HPO.

ML applications on quality prediction

Automatic process monitoring methods have raised great concern in recent years because hand-crafted quality identification is tedious, time-consuming, laborious, and prone to errors and omissions. Many ML methods have been applied on the quality prediction in various fields. Scime and Beuth⁸ used Decision Tree, Support Vector Machine for additive manufacturing quality identification; El Mazgualdi et al.⁹ applied Random Forest, XGBoost, and Deep Learning for prediction of efficiency in manufacturing industry; Li et al.¹⁰ and Zhang et al.¹¹ applied and showed that data-driven algorithms are highly effective tool for automatic feature extraction and quality monitoring performance.

However, even though there are a number of ML methods applications on workpiece quality identification and prediction,^12,13 the performances of ML depend on its hyperparameters heavily. Since the models for workpiece quality prediction have complex hyperparameter space, the tuning process for workpiece quality prediction is challengeable, and it is promising to develop the automatic hyperparameters tuning process for workpiece quality prediction models.

HPO in AutoML

AutoML aims at using ML methods in a data-driven and automated way. In some cases, AutoMLs surpass human experts in some tasks.¹⁴ HPO is the most popular task in AutoML, and the results have shown that the performances of ML can be improved by HPO.^3,4

If the ML algorithm A has N hyperparameters to be optimized, the domain of the n-th hyperparameter denoted by $Λ_{n}$ , then the overall hyperparameter configuration space can be computed by $Λ = Λ_{1} \times Λ_{2} \times \dots \times Λ_{n}$ . Given a dataset D, the goal as shown in equation (1) is to find the hyperparameters that can minimize the loss L of the model generated by algorithm A with hyperparameters $λ$ on training data $D_{train}$ and evaluated on validation data $D_{valid}$ ¹⁵

λ^{*} = \underset{λ \in Λ}{\arg min} V (L, A_{λ}, D_{train}, D_{valid})

(1)

Many HPO methods have been investigated in the field, such as grid search, random search, evolutionary algorithms, and Bayesian optimization.

Grid search: Grid search (GS)¹⁶ tries every possibility and takes the best performance parameters as shown in equation (1) as the final result by traversing the combination of all candidate parameters. It is reliable in low dimensional space, but the computational complexity will increase exponentially with the scale of the super parameters to be optimized.

Random search: Random search (RS) is similar to GS except for point selection, since it randomly selects points from the hyperparameter space. Some studies by Schaer et al.¹⁷ and Bhat et al.¹⁸ show that RS is far more efficient than GS in high-dimensional space. Because of lack of guidance, RS easily fails to converge to the optimal configuration.

Evolutionary algorithms: Evolutionary algorithm (EA), such as genetic algorithm and particle swarm optimization, obtains the best configurations depending on the information sharing and evolution among populations. They are easy to implement parallel¹⁹ but need much more time.^20,21 Others methods like annealing need less computational time but they are prone to fall into the local optimum.

Bayesian optimization: Bayesian optimization (BO) uses Bayes theorem²² to find the optimal hyperparameter configurations. A surrogate model is constructed according to the posterior probability distribution, and the next most potential point is selected by maximizing acquisition function. It is cheap to compute and has been applied in some applications.^23,24

HPO methods are effective and efficient, but they also need to be further improved when the hyperparameter spaces are complex and high dimension in the applications on manufacturing industry. In this article, a new AutoML based HPO is conducted for workpiece quality prediction.

HPO based on AutoML

This section presents the workflow of HPO based on AutoML, introduces TPE, and explains the procedure to generate next promising point.

Workflow of HPO based on AutoML

The most promising framework for expensive black-box function optimization in AutoML is BO. It is an iterative algorithm and consists of two critical components, the probabilistic surrogate model and acquisition function. The workflow of HPO is presented in Figure 1.

Step 1: Initialize. Sample serval points from the configuration space and evaluate them.

Step 2: Fit data. Use the probabilistic surrogate model TPE (see in section “Tree pazen estimator”) to establish prior distribution and posterior distribution based on data D.

Step 3: Generate the next point. Obtain the next promising point $x_{i + 1}$ via optimizing acquisition function EI (see in section “Procedure to generate next promising point”).

Step 4: Evaluate. Evaluate the chosen point by computing its objective function to get $y_{i + 1}$ .

Step 5: Update. Update the data D.

Step 6: Repeat step2∼step5 until sampling the required number of points n.

Figure 1.

The workflow of HPO.

The probabilistic surrogate model and acquisition function are shown in Figure 2. The probabilistic surrogate model is used to model the target function by fitting all current observations. Acquisition function describes the trade-off strategy between exploration and exploration, the next sampling point is generated by maximizing it. In Figure 2(a), the highest acquisition value always occurs where the posterior mean is low and the posterior uncertainty is high. The chosen point will be evaluated to update the observation. In Figure 2(b), when enough points are sampled, the probabilistic surrogate model is almost as accurate as the objection function.

Figure 2.

The probabilistic surrogate model and acquisition function in HPO.

Tree pazen estimator

TPE is proposed by Bergstra et al.²⁵ It reduces computation by modeling $p (x | y)$ and $p (y)$ , and performs better in large-scale and complex hyperparameters tuning. TPE transforms the prior distribution of each parameter into the truncated Gaussian mixture and modifies its posterior distribution based on the observations ${(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{n}, y_{n})}$ . Then it sorts the target value and divides y into two parts with $y^{*}$ as the boundary, and the conditional probability density function (PDF) of x under y would be established respectively as shown in Figure 3. The definition of $p (x | y)$ is shown as follows

p (x | y) = {\begin{matrix} l (x), y < y^{*} \\ g (x), y \geq y^{*} \end{matrix}

(2)

where $l (x)$ is the density formed by the observations ${y_{i}}$ that was less than $y^{*}$ and $g (x)$ is the density formed by using the remaining observations.²⁵ $y^{*}$ is some quantile $γ$ of the observed y values, that means $γ = p (y < y^{*}) = \int_{- \infty}^{y^{*}} p (y) dy$ , in this research set $γ$ 0.25.

Figure 3.

The establishment process of $l (x)$ and $g (x)$ .

Generally, $l (x)$ and $g (x)$ are Gaussian processes (GPs). The nature of kernel density estimators enables TPE performing well in complex configuration space, including the continuous, discrete, and categorical hyperparameters. Compared to cubic-time GPs to the number of points, TPE only consumes linear time to that.

Procedure to generate next promising point

The acquisition function trades off exploration and exploitation to select the most promising point, which makes the targets better. This means that a point sampled needs to meet the condition: $\arg max_{x} u (x | D)$ , where $u (\cdot)$ is the generic symbol for an acquisition function. The most acquisition functions are probability of improvement (PI), upper confidence bound (UCB), and expected improvement (EI).

1. Probability of improvement

The formulation of PI is presented by equation (3). It considers the probability of improving but ignores the increasement. $Φ (\cdot)$ denotes the normal cumulative distribution function and $f (x^{+})$ denotes the current best target value

\begin{matrix} \max PI (x) = P (f (x) \geq f (x^{+}) + ζ) = Φ (Z) \\ = Φ (\frac{μ (x) - f (x^{+}) - ζ}{σ (x)}) \end{matrix}

(3)

2. Upper confidence bound

UCB uses a tunable k to balance exploitation against exploration, as shown in equation (4)

\max UCB (x) = μ (x) + k σ (x)

(4)

3. Expected improvement

EI is the most frequently used function since it considers both the probability and increasement of a point. As shown in equation (5), $ϕ (\cdot)$ represents the probability of cumulative distribution of the standard normal distribution respectively²⁶

\begin{matrix} \max EI (x) = \\ {\begin{matrix} (μ (x) - f (x^{+}) - ζ) Φ (Z) + σ (x) ϕ (Z), σ (x) > 0 \\ 0, σ (x) = 0 \end{matrix} \end{matrix}

(5)

Maximizing EI is equivalent to maximizing the ratio $l (x) / g (x)$ in TPE, as shown in equation (6), $y^{*}$ is some quantile $γ$ of the observed y. From Figure 1, points sampled from $l (x)$ are more likely to further reduce the objective function value, so it is expected to find a point $x'$ from $l (x)$ that makes $l (x')$ larger and $g (x')$ smaller

\begin{matrix} \max E I_{y^{*}} (x) = \frac{γ y^{*} l (x) - l (x) \int_{- \infty}^{y^{*}} p (y) dy}{γ l (x) + (1 - γ) g (x)} \\ \propto {(γ + \frac{g (x)}{l (x)} (1 - γ))}^{- 1} \end{matrix}

(6)

The proposed ATPE for HPO in high dimension

This section presents the workflow of the proposed ATPE, and the details of the adaptive warm-up process for TPE.

Workflow of ATPE

The workflow of ATPE is shown in Figure 4. There are total three modules in ATPE: adaptive warm-up process, TPE, and EI. In ATPE, the method first generates and evaluates some observations (it set to be 20 points in this research) by adaptive warm-up process. Then ATPE builds the PDF separately by TPE to fit the observations from warm-up process. Finally, ATPE uses EI to generate the next point. This process will continue until reaching the max evaluation n.

Figure 4.

The workflow of ATPE.

The adaptive warm-up process

The configuration space has various kinds of hyperparameter such as continuous, discrete, categorical, and conditional and they are described by the distribution and the interval; the width of interval is one term in the objective function. The width of the interval is inversely proportional to the number of iterations n to obtain stable and good results within several evaluations, as shown in Figure 5. If the point sampled from the interval is better, the interval will shrink naturally, just as displayed in Figure 5(a). Otherwise, the interval will speed up the reduction to avoid falling into this bad area again as shown in Figure 5(b).

Figure 5.

Illustration of adaptive warm-up process.

Let $w_{0}$ denote the initial interval width, n denote the number of iterations, $y_{new}$ is the newest of objective function value, $y_{min}$ means its minimum value, $y_{median}$ means its median. Then the update of the interval can be calculated as follows

w_{n} = {\begin{matrix} w_{0} \times \frac{1}{1 + 0.075 n}, & i f y_{n e w} \leq y_{\min} + \frac{y_{m e d i a n}}{2} \\ w_{0} \times \frac{1}{1 + 0.075 n} \times \frac{y_{n e w}}{\frac{y_{\min} + y_{m e d i a n}}{2}}, & i f y_{n e w} > \frac{y_{\min} + y_{m e d i a n}}{2} \end{matrix}

(7)

The proposed adaptive warm-up process for TPE is shown in Algorithm 1. It should be noted that minimizing the objective function is assumed.

Algorithm 1: Adaptive warm-up process for TPE.
input: optimization function, the number of points for start-up process m
output: the observation set $D (x, y)$
start:
1 Sample one point randomly from the configuration space
2 Evaluate the point sampled from line1 to form the observation set $D {x_{1}, y_{1}}$
for i = 1, 2, 3, …, m do
3 Select $x_{best}$ from $D {x_{1 : i}, y_{1 : i}}$ as interval center
4 Compute $y_{best}, y_{median}$ from $D {x_{1 : i}, y_{1 : i}}$ 5 Update interval width $w_{i}$ according to the comparation of $y_{i}$ and $(y_{median} + y_{best}) / 2$ , if $w_{i}$ is beyond the original range, the original boundary will be used instead6 sample one point $x_{i + 1}$ from $Θ (x_{best}, w_{i})$ and evaluate it: $y_{i + 1} = f (x_{i + 1})$
7 Update the observation set $D {x_{1 : i + 1}, y_{1 : i + 1}}$
end for
return D

The proposed ATPE can automatically tune hyperparameters of ML models during training process avoiding repetitive and tedious manual search especially in high dimension, and in this research, it is applied to the workpiece quality prediction to tune eight different hyperparameters.

Case studies and results

In order to validate the effectiveness and efficiency of the ATPE, the case studies on the mnist dataset and workpiece quality dataset are conducted. The proposed method is implemented by Tensorflow and Hyperopt from Python and runs on Ubuntu 16.04 with RTX 2080Ti GPU.

Cross validation is a popular technique in ML to obtain the stable and reliable prediction of workpiece quality. Hence, the proposed ATPE uses fivefold cross validation for fair comparison. There are 50 evaluations in each run. All experimental results are tested with ten independent runs.

Case 1: MNIST dataset

The MNIST database of handwritten digits, a typical dataset of computer vision, has 60k training and 10k testing samples. It has ten handwritten digits (0–9) with a size of 28 × 28 pixels. The Sparse Stack Autoencoder (SSAE) is often seen in industrial image recognition applications due to its efficient feature extraction and simple implementation. The hyperparameters influence its performance heavily, so it is important to apply HPO to SSAE. In this case study, SSAE is applied and the proposed ATPE is used for the HPO of SSAE.

The parameters of SSAE are: epoch is 30 and AdamOptimizer is applied to minimize the function $Los s_{AE}$ or $Los s_{soft max}$ , which are given by equation (8). $MSE (\cdot)$ is for measuring the differences between input layer and output layer; $ρ$ is sparsity coefficient; $‖ \cdot ‖_{2}$ is the L2-norm for decaying the large weight values; $β, λ$ are balance coefficients

{\begin{matrix} Los s_{AE} = MSE (x, h_{w, b} (x)) + β \cdot \sum KL (ρ | | {\hat{ρ}}_{j}) \\ + \frac{λ}{2} \cdot \sum ‖ w ‖_{2}^{2} \\ Los s_{soft max} = cross_entropy (y, y') \end{matrix}

(8)

The hyperparameters of SSAE are as follows: (i) the number of stack AEs; (ii) the number of hidden units for the first autoencoder; (iii) learning rate; (iv) sparse coefficient; (v) weight balance coefficients; (vi) sparsity balance factor; (vii) batch size, and (viii) sparse coefficient. Table 1 provides more details about these parameters. Symbol U means uniform, qU stands for discrete uniform, drawn by round (uniform (low, high)/q) * q. The sampling method is random. Note that the hidden layer units can be formulated as equation (9). The units of first layer indicate the number of units in each hidden layer decreased by an equal difference after that

\begin{matrix} h_{i}_units = h_{1}_units - delta \times (i - 1) \\ delta = h_{1}_units / nun_AE \end{matrix}

(9)

Table 1.

Hyperparameters that need to be optimized of SSAE on MNIST.

Hyperparameter	Type	Range	Esti. Num.
The number of stack AE	Discrete	qU (1, 5,1)	5
h1_units	Discrete	qU(200,800,1)	12
Learning rate	Continuous	U(0.001,0.01)	2
Sparsity coefficient	Continuous	U(0,1)	2
Weight balance factor	Continuous	U(1e-6, 0.01)	2
Sparsity balance factor beta	Continuous	U(0,4)	5
Batch size	Categorical	[128,200,256,300,512]	5
The activation function	Categorical	[sigmoid, tanh, relu, selu]	4

It has eight dimensions and can be estimated that there are almost 4,800 combinations of hyperparameters in Table 1. Esti. Num. means estimated number of feasible values. The combinations estimation are as follows: (1) The number of hidden units has a large range and is usually taken values at equal intervals (set 50) in human-design, meaning there are 12 values. (2) The sparsity coefficient is the core of SSAE, therefore give it five choices. (3) Except for categorical and discrete hyperparameters, the others can be considered at least two values for each. So the total combinations are calculated by $5 \times 12 \times 2 \times 2 \times 2 \times 5 \times 5 \times 4 = 4800$ .

Results on accuracy and convergence

In this case study, both accuracy and time are taken into consideration for evaluation. Table 2 and Figure 6 show the experimental results of 10 runs, including the min, max, mean, std of accuracy. From these results, it can be concluded that compared with RS, annealing, and TPE, the proposed algorithm shows superior performance in high-dimensional space: It has achieved comparative results, the best max value 97.68% and the best mean value 97.22%. Table 3 gives the ANOVA analysis of these four methods, and the results show the stability and optimization ability of ATPE are better, and the result of ATPE is significantly better than other three methods since the $F > F_{crit}$ .

Table 2.

Experimental statistics on MNSIT (%).

	Min	Max	Mean	Std
RS	96.84	97.18	97.03	00.11
Annealing	96.01	97.53	96.76	00.41
TPE	96.25	97.56	96.99	00.42
ATPE	96.55	97.68	97.22	00.34

Figure 6.

The best result convergence curves of ATPE and TPE in case 1.

Table 3.

The ANOVA of four algorithms in case 1.

Difference	SS	df	MS	F	P-value	F crit
Betweengroups	0.000107	3	3.57E-05	3.025704	0.041981	2.866266
Withingroups	0.000425	36	1.18E-05
Total	0.000532	39

The best result convergence curves of ATPE and TPE are shown in Figure 6; the first 20 iterations are warm-up process. ATPE performs as better as TPE without human interfere, but it finds the best result more quickly.

In order to conduct the further study on efficiency of algorithms, the best index of each run is figured out. As described in Table 4, ATPE has 80% probability to find the optimal value in the first 30 iterations, while TPE and annealing need more evaluations. This result indicates that the adaptive warm-up process could raise speed of convergence for TPE in high-dimensional space, when it was applied to SSAE.

Table 4.

Argmin of objective function value from each run in case 1.

	RS	Annealing	TPE	ATPE
1	32	21	47	0
2	32	44	38	27
3	32	49	27	22
4	32	41	31	23
5	32	45	47	48
6	32	44	38	23
7	32	32	21	43
8	32	48	47	24
9	32	39	38	21
10	31	49	39	21

The convergence of hyperparameters

The convergence of hyperparameters in ATPE is also presented in this study. Taking the fourth run of ATPE as an example as shown in Figure 7, the objective function values are sorted in descending order and divided by 1/4 quartile; the first 1/4 quartile is better and the rest is worse. The histogram is drawn for discrete hyperparameters, and the density function based on Gaussian kernel is drawn for continuous ones.

Figure 7.

Sampling space of SSAE in ATPE.

As shown in Figure 7, sparsity coefficient $ρ$ converges at 0.5 around, which extends the experience value of experts (e.g. 0.5, 0.1, 0.15). The number of stacked autoencoders performs best at value 1 and the relatively shallow architectures are more likely to have greater performance. $λ$ samples most near 0.001, which is consistent with the most common human-designed values.

Based on results, it experimentally shows that the ATPE automatically improves the performance of hyperparameters of SSAE in high dimensions and finds the regions with larger probability to generate good solution in this case study.

Case 2: workpiece quality dataset

The workpiece quality dataset is from the competition “prediction of quality conformity rate of typical workpieces in discrete manufacturing process,” which is held by China Computer Federation. It comes from the real data collected by a factory and has been desensitized. This dataset will be referred to as workpiece quality dataset for the rest of this article. This dataset includes two types of features: (1) 10 classes of equipment processing parameters, referred to as P; (2) 10 attributes of quality index, referred to as A. The quality level is divided into four categories: excellent, good, pass, and fail. The denotation of “Fail-0” denotes that the category fail is used as 0 when training. There are 12,934 pieces of training data and 6000 pieces of test data.

The dataset is unbalanced, which can be seen from Table 5. Therefore, stratified K-fold cross validation (with K = 5) is applied. Through descriptive statistics and polynomial construction, 45 kinds of features are constructed finally, including statistical features (e.g. mean, std, frequency).

Table 5.

The number of categories.

Quality	Fail-0	Pass-1	Good-2	Excellent-3
Number	1826	5245	3456	2407

XGBoost has greatly improved the training speed and prediction accuracy of the model. Because of its excellent performance and low computational complexity, it has been widely used in industry.²⁷ The hyperparameters have great impact on its accuracy, so it is realistic to study the HPO of XGBoost. In this research, the XGBoost method is implemented from XGBoost package and its hyperparameters to be optimized are described in Table 6. In this hyperparameters, the subsample denotes the fraction of observations to be randomly samples for each tree and colsample_by_tree denotes the fraction of columns to be randomly samples for each tree; both of them reflect randomness of XGBoost.

Table 6.

Hyperparameters that need to be optimized of XGBoost.

Hyperparameters	Type	Range	Esti. Num.
max_depth	Discrete	qU [3,10,1]	8
learning_rate	Continuous	U[0.001,0.1]	2
n_estimators	Discrete	qU [200,700,1]	5
reg_alpha	Continuous	U[1e-6,10]	2
reg_lambda	Continuous	U[1e-6,1]	2
subsample	Continuous	U[0.3,0.6]	5
colsample_by_tree	Continuous	U[0.5,1]	2
booster	Categorical	[dart, gbtree]	2

The hyperparameter in this case study has 8 dimensions and there are 6400 combinations of hyperparameters in Table 6. Esti. Num. means estimated number of feasible values. The combinations estimation are as follows: (1) The number of n_estimators takes values at equal intervals (set 100) in human-design, so there are 5 values. (2) Subsample mainly reflects the randomness and accuracy of XGBoost, therefore give it 5 choices. (3) Except for categorical and discrete hyperparameters, the others can be considered as at least 2 values. So the total combination are calculated by $8 \times 2 \times 5 \times 2 \times 2 \times 5 \times 2 \times 2 = 6400$ .

Results on accuracy and convergence

The results of ATPE are also compared with other algorithms, as shown in Table 7. The average accuracy of ATPE achieves 55.3209% over 10 independent runs, which is better than TPE, annealing, and RS.

Table 7.

Cross validation results of algorithms of XGBoost (%).

	Mean	Std
RS	55.2923	0.0879
Annealing	55.2551	0.3188
TPE	55.3185	0.0325
ATPE	55.3209	0.0422

The best result convergence curves of ATPE and TPE are shown in Figure 8. The first 20 iterations are warm-up process; there is obvious improvement in TPE after applying adaptive algorithm, during which ATPE performs better than TPE without human interfere.

Figure 8.

The best result convergence curves of ATPE and TPE in case 2.

The best index of each run is figured out in Table 8; both ATPE and TPE has 90% probability to find the optimal value in the first 40 iterations, while ATPE has the smallest median, value at 29. This case experimentally proves that in high dimensions, the adaptive warm-up process has a faster convergence for TPE when it applied to XGBoost.

Table 8.

Argmin of objective function value from each run in case 2.

	RS	Annealing	TPE	ATPE
1	48	32	25	30
2	38	13	25	28
3	34	44	33	25
4	14	44	21	35
5	49	40	42	23
6	35	24	35	15
7	4	45	39	12
8	0	46	28	45
9	26	37	26	36
10	20	40	32	36

The convergence of hyperparameters

Hyperparameters in this case are eight dimensions. Take the best run of ATPE as an example and results display that during the training process, ATPE is also able to automatically optimize and schedule hyperparameters well even in high-dimensional space, as shown in Figure 9.

Figure 9.

Sampling space of XGBoost in ATPE.

In this case study, max_depth performs well when getting value 9, and it is consistent with human-design. The relatively less estimators are more likely to have greater performance due to the dataset being small, revising the common sense that more estimators would get better prediction. Even the configuration space is described by uniform, most hyperparameters (e.g. learning_rate, reg_alpha, subsample, colsample) are scheduled at small range where the probability of generation better results is larger than the worse during the automatic iterations.

Conclusion and future researches

This article presents a new AutoML based HPO of workpiece quality prediction, named ATPE. The main contribution is as follows: ATPE provides an adaptive warm-up process for TPE and it can automatically tune hyperparameters in high dimension. The proposed algorithm is tested on SSAE based MNIST and XGBoost based workpiece quality dataset. The results show that it accelerates convergence of eight hyperparameters without human interference and outperforms RS, annealing, and TPE. The results show that ATPE not only eases tuning hyperparameters process but also achieves state-of-the-art performance, validating its potential on the workpiece quality prediction.

The limitation of the proposed algorithm requires hyperparameters to remain unchanged during each evaluation. That means it is unable to handle those changeable ones during training process. Therefore, the future research can introduce reinforcement learning to tune these hyperparameters.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by National Key R&D Program of China (Grant No. 2019YFB1704600), National Natural Science Foundation of China (Grant No. 51805192, 51825502) and the State Key Laboratory of Digital Manufacturing Equipment and Technology (DMET) of Huazhong University of Science and Technology (HUST) under Grant No. DMETKF2020029.

ORCID iDs

Long Wen

Liang Gao

References

Lin

Chen

. Building a workpiece quality prediction model. US10579026-B2 Patent, 2020.

Yang

Guo

Huang

, et al. Research on the milling tool wear and life prediction by establishing an integrated predictive model. Measurement 2019; 145: 178–189.

Klein

Dai

Hutter

, et al. Meta-surrogate benchmarking for hyperparameter optimization. Adv Neural Inform Proc Syst. Epub ahead of print 30 May 2019, https://assets.amazon.science/aa/72/6e7163c947209e835cf637131d3b/meta-surrogate-benchmarking-for-hyperparameter-optimization.pdf

Thiede

Parlitz

. Gradient based hyperparameter optimization in echo state networks. Neural Networks 2019; 115: 23–29.

Shi

, et al. Tilt angle monitoring by using sparse residual LSTM network and grid search. IEEE Sensors J 2019; 19(19): 8803–8812.

Hao

Ahmad

Ismail

RMTR

, et al. Performance evaluation of random search based methods on model-free wind farm control. Intelligent manufacturing & mechatronics, Singapore: Springer, 2018, pp. 657-670. http://umpir.ump.edu.my/id/eprint/21719/

McParland

Baron

O’Rourke

, et al. Prediction of tool-wear in turning of medical grade cobalt chromium molybdenum alloy (ASTM F75) using non-parametric Bayesian models. J Intell Manuf 2019; 30(3): 1259–1270.

Scime

Beuth

. Using machine learning to identify in-situ melt pool signatures indicative of flaw formation in a laser powder bed fusion additive manufacturing process. Addit Manuf 2019; 25: 151–165.

El Mazgualdi

Masrour

El Hassani

, et al. Using machine learning for predicting efficiency in manufacturing industry. In: Ezziyyani

(ed.) Advanced intelligent systems for sustainable development, Cham, Switzerland: Springer, 2019, pp. 750–762.

10.

Zhang

Ding

, et al. Diagnosing rotating machines with weakly supervised data using deep transfer learning. IEEE Trans Indus Inf 2019; 16(3): 1688–1697.

11.

Zhang

Jia

X-D

, et al. Machinery fault diagnosis with imbalanced data using deep generative adversarial networks. Measurement 2020; 152: 107377.

12.

Zhang

Hong

, et al. Extraction and evaluation of melt pool, plume and spatter information for powder-bed fusion am process monitoring. Mater Design 2018; 156: 458–469.

13.

Kwon

Kim

Ham

, et al. A deep neural network for classification of melt-pool images in metal additive manufacturing. J Intell Manuf 2018: 31: 375–386.

14.

Ademujimi

Brundage

Prabhu

. A review of current machine learning techniques used in manufacturing diagnosis. In: Lödding

Riedel

Thoben

, et al. (eds) IFIP advances in production management systems, Cham, Switzerland: Springer, 2017, pp. 407–415.

15.

Frank

Lars

Joaquin

. Automated machine learning. Cham, Switzerland: Springer, 2019.

16.

Xie

Zhao

Xie

, et al. Multi-classification method for determining coastal water quality based on SVM with grid search and KNN. Int J Performability Eng 2019; 15(10): 2618.

17.

Schaer

Muller

Depeursinge

. Optimized distributed hyperparameter search and simulation for lung texture classification in CT using hadoop. J Imaging 2016; 2(19): 1–20.

18.

Bhat

Prosper

Sekmen

, et al. Optimizing event selection with the random grid search. Comput Phys Commun 2018; 228: 245–257.

19.

Falkner

Klein

Hutter

. BOHB: robust and efficient hyperparameter optimization at scale. In: Proceedings of the 35th international conference on machine learning, Stockholm, Sweden: PMLR, 2018, pp. 1437-1446.

20.

Assad

Deep

. A hybrid harmony search and simulated annealing algorithm for continuous optimization. Inform Sci 2018; 450: 246–266.

21.

Gobeyn

Mouton

Cord

, et al. Evolutionary algorithms for species distribution modelling: a review in the context of machine learning. Ecol Model 2019; 392: 179–195.

22.

Liu

, et al. Hyper-parameter optimization using MARS surrogate for machine-learning algorithms. IEEE Trans Emerg Topics Comput Intell 2020; 4(3): 287-297.

23.

Letham

Karrer

Ottoni

, et al. Constrained Bayesian optimization with noisy experiments. Bayesian Anal 2019; 14(2): 495–519.

24.

Chen

Huang

Wang

, et al. Bayesian optimization in alphago. Arxiv Preprint Arxiv:1812.06855, 2018.

25.

Bergstra

Bardenet

Bengio

, et al. Algorithms for hyper-parameter optimization. In: Proceedings if the advances in neural information processing systems, Granada, Spain, 2011, pp. 2546-2554. New York: Curran Associates Inc.

26.

Frazier

. A tutorial on bayesian optimization. Arxiv Preprint Arxiv:1807.02811, 2018.

27.

Han

. A novel lane-changing decision model for autonomous vehicles based on deep autoencoder network and XGBoost. IEEE Access 2020; 8: 9846–9863.

	RS	Annealing	TPE	ATPE
1	32	21	47	0
2	32	44	38	27
3	32	49	27	22
4	32	41	31	23
5	32	45	47	48
6	32	44	38	23
7	32	32	21	43
8	32	48	47	24
9	32	39	38	21
10	31	49	39	21

	RS	Annealing	TPE	ATPE
1	48	32	25	30
2	38	13	25	28
3	34	44	33	25
4	14	44	21	35
5	49	40	42	23
6	35	24	35	15
7	4	45	39	12
8	0	46	28	45
9	26	37	26	36
10	20	40	32	36

	RS	Annealing	TPE	ATPE
1	32	21	47	0
2	32	44	38	27
3	32	49	27	22
4	32	41	31	23
5	32	45	47	48
6	32	44	38	23
7	32	32	21	43
8	32	48	47	24
9	32	39	38	21
10	31	49	39	21

	RS	Annealing	TPE	ATPE
1	48	32	25	30
2	38	13	25	28
3	34	44	33	25
4	14	44	21	35
5	49	40	42	23
6	35	24	35	15
7	4	45	39	12
8	0	46	28	45
9	26	37	26	36
10	20	40	32	36

A new automatic machine learning based hyperparameter optimization for workpiece quality prediction

Abstract

Keywords

Introduction

Literature review

ML applications on quality prediction

HPO in AutoML

HPO based on AutoML

Workflow of HPO based on AutoML

Tree pazen estimator

Procedure to generate next promising point

The proposed ATPE for HPO in high dimension

Workflow of ATPE

The adaptive warm-up process

Case studies and results

Case 1: MNIST dataset

Results on accuracy and convergence

The convergence of hyperparameters

Case 2: workpiece quality dataset

Results on accuracy and convergence

The convergence of hyperparameters

Conclusion and future researches

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

References

	RS	Annealing	TPE	ATPE
1	32	21	47	0
2	32	44	38	27
3	32	49	27	22
4	32	41	31	23
5	32	45	47	48
6	32	44	38	23
7	32	32	21	43
8	32	48	47	24
9	32	39	38	21
10	31	49	39	21

	RS	Annealing	TPE	ATPE
1	48	32	25	30
2	38	13	25	28
3	34	44	33	25
4	14	44	21	35
5	49	40	42	23
6	35	24	35	15
7	4	45	39	12
8	0	46	28	45
9	26	37	26	36
10	20	40	32	36