Sage Journals: Discover world-class research

Abstract

The purpose of this article is to improve the convergence efficiency of the traditional efficient global optimization method. Furthermore, we try a graphics processing unit–based parallel computing method to improve the computing efficiency of the efficient global optimization method for both mathematical and practical engineering problems. First, we propose a multiple-data-based efficient global optimization algorithm instead of the multiple-surrogates-based efficient global optimization algorithm. Second, a novel graphics processing unit–based general-purpose computing technology is adopted to accelerate the solution efficiency of our multiple-data-based efficient global optimization algorithm. Third, a hybrid parallel computing approach using the OpenMP and compute unified device architecture is adopted to further improve the solution efficiency of forward problems in practical application. This is accomplished by integrating the graphics processing unit–based finite element method numerical analysis system into the optimization software. The numerical results show that for the same problem, the optimal result of the multiple-data-based efficient global optimization algorithm is consistently better than the multiple-surrogates-based efficient global optimization algorithm with the same optimization iterations. In addition, the graphics processing unit–based parallel simulation system helps in the reduction of the calculation time for practical engineering problems. The multiple-data-based efficient global optimization method performs stably in both high-order mathematical functions and large-scale nonlinear practical engineering optimization problems. An added benefit is that the computational time and accuracy are no longer obstacles.

Keywords

Global optimization parallel computing graphics processing units compute unified device architecture automotive body design

Introduction

Currently, optimization design is widely used in the automotive industry to optimize production or to test the behavior of the product before a prototype is built or real-life tests are performed.^1,2 However, since a number of function evaluations are repeatedly performed, a practical engineering optimization is time-consuming. Therefore, a surrogate-based optimization (SBO) algorithm has been suggested as an effective approach for designs using time-consuming models. Within SBO, an iterative process that involves the creation, optimization, and updating of a fast and analytically tractable surrogate model is proposed to replace the direct optimization of the computationally expensive model.^3,4 The surrogate model is used to visualize input–output relationships, search for an optimum candidate, and finally to analyze the improved design.⁵ The common surrogate models include response surfaces,⁶ Kriging,^7,8 support vector machines,⁹ and space mapping.¹⁰ For the automobile industry, various kinds of surrogate models are widely applied to the structural and parameter optimization of the sheet-metal forming process^11,12 and crash-safety design.¹³ This article focuses on a modern SBO methodology called the efficient global optimization (EGO) algorithm, which is proposed by Jones et al.¹⁴ The EGO algorithm favors high convergence efficiency for global optimization by using both estimated prediction and estimated uncertainty to provide the next sampling point. It is especially good for nonlinear, multimodal functions that often occur in industrial applications. Since more robust and accurate approximation can be achieved by the use of an ensemble of surrogates, a multiple-surrogate efficient global optimization (MSEGO) algorithm is presented.¹⁵ This algorithm adds several design points per optimization cycle and is benefitted by the use of multiple surrogates. Numerical results show that the MSEGO algorithm can reduce the number of iterations required for convergence, thus reducing the number of function evaluations.¹⁵ Unfortunately, since the best surrogate of most black-box problems is not known beforehand, the ensemble of surrogates is necessary to extract more information from the data to ensure its robustness. However, several surrogates, such as the support vector machine (SVM) and radial basis neural network (RBNN), will consume more resources than the Kriging model. It is also difficult to choose suitable surrogates to generate diversity, although some selection strategies have been presented, and the results may become even worse in some cases when the number of surrogates increases.⁴ In this article, we propose a multiple-data-based efficient global optimization (MDEGO) algorithm, using a set of multiple data to generate diversity. The MDEGO algorithm uses multiple data to replace multiple surrogates by adding several infill points in each optimization cycle, and it suggests that the multiple data come from the globally sampled data. Compared with the multiple surrogates method, multiple data is easier to obtain by simply choosing a subset of optimization according to a special selection strategy. Furthermore, the multiple-data strategy is well suited for parallel computing.

In recent years, graphics processing units (GPUs) have become a novel parallel tool since they offer a tremendous amount of computing resources not only for graphics processes but also for general-purpose parallel computations.^16,17 Actually, one of the promising trends in the field of parallel global optimization is the use of GPUs to enhance the speed of computation.¹⁸ For example, Barkalov et al.¹⁹ presented a parallel global optimization algorithm combined with a dimension reduction scheme. Of course there are many other GPU-based parallel global optimization algorithms, including the fine-grained parallel particle-swarm optimization method,²⁰ the genetic algorithm,²¹ the differential evolution algorithm,²² shuffled complex evolution method developed at The University of Arizona (SCE-UA) method,²³ and the simulated annealing algorithm.²⁴ Furthermore, a review of modern methods of parallel global optimization can be found in the work of D’Apuzzo et al.²⁵ On the other hand, GPU is widely used to accelerate the implementations of finite element (FE) applications, including Monte Carlo simulation,²⁶ dynamic explicit FE analysis,²⁷ electromagnetic analysis,²⁸ and so on. Based on these studies, the parallel computing approach has been studied to improve the computing efficiency of optimal design based on the MDEGO algorithm.

The rest of this work is structured as follows. In section “EGO algorithm,” the EGO method is briefly reviewed. In section “General-purpose computing on GPU,” the general-purpose computing by GPU is briefly introduced. The MDEGO algorithm based on multiple data and its parallel implementation are presented in section “MDEGO and its parallel implementation.” In section “Numerical examples,” some numerical examples are given to demonstrate the effectiveness and efficiency of the MDEGO algorithm. Section “Conclusion” offers some conclusions and discusses some remaining challenges and opportunities.

EGO algorithm

In the context of the EGO algorithm, the response function can be expressed as the sum of two terms

y (x) = b (x) + z (x)

(1)

where b(x) is the regression polynomial model and describes the global trend of the response function, and the second term $z (x)$ is a stochastic process giving the local deviations from the global trend. The correlation function between points xⁱ and x^j is defined in equation (2) and controls the smoothness of the surrogate model. Here, the Gaussian correlation function is chosen

R (x (i), x (j)) = \exp (- \sum_{k = 1}^{m} θ_{k} {| x_{k}^{(i)} - x_{k}^{(j)} |}^{2})

(2)

where x denotes the design variable vector, m represents the number of design variables, and $θ_{k}$ is the unknown correlation function parameter measuring the importance or “activity” of the design variable x_k. This means that large values of $θ_{k}$ will translate small values of $| x_{k}^{(i)} - x_{k}^{(j)} |^{2}$ into large “distance” and hence low correlation. Small values of $θ_{k}$ will smoothen the Kriging prediction as the correlation function drops off less rapidly with the change in the design variable x.

According to the Kriging model, the response value $\hat{y}$ at point x can be estimated, and the expression for this value is

\hat{y} (x) = f \hat{β} + r^{T} R^{- 1} (y - f \hat{β})

(3)

where f is a unit vector with length equal to the number of sampled points p , $\hat{β}$ is an estimator for the regression model, r is the vector of correlations between a new estimated point x and the sample points, R is the correlation matrix between two sample points xⁱ and x^j, and y is the observation vector with length equal to the number of current evaluations p .

The Kriging prediction variance can also be estimated and should be minimized in order to obtain a good approximation

s^{2} = {\hat{σ}}^{2} [1 - r^{T} R^{- 1} r + \frac{{(1 - f^{T} R^{- 1} r)}^{2}}{f^{T} R^{- 1} f}]

(4)

where ${\hat{σ}}^{2}$ denotes the estimated variance. R and r depend on parameter $θ_{k}$ , which is chosen by maximizing the likelihood of the sample.

EGO iteratively adds points to the data set in order to obtain an improvement for the current best function value $f_{\min} = \min (y^{1}, y^{2}, \dots y^{p})$ . The improvement at the point x is defined as

I (x) = \max (f_{\min} - y (x), 0)

(5)

The expression is a random variable, since the response function y(x) is a random variable as a realization of a Gaussian process. Then, the expected improvement EI(x) can be expressed in closed form

EI (x) = (f_{\min} - \hat{y} (x)) Φ (\frac{f_{\min} - \hat{y} (x)}{s (x)}) + s ϕ (\frac{f_{\min} - \hat{y} (x)}{s (x)})

(6)

In the above, $Φ (\cdot)$ and $ϕ (\cdot)$ are the cumulative density function and probability density function of a normal distribution, respectively. $\hat{y} (x)$ and s(x) denote the Design and Analysis of Computer Experiments (DACE) prediction and the prediction standard deviation at point x.

General-purpose computing on GPU

Driven by the insatiable market demand for real-time, high-definition three-dimensional (3D) graphics, the programmable GPU is designed as a parallel device with single-instruction multiple-data (SIMD) classification, which is well suited for problems that can be expressed as data-parallel computations with high arithmetic intensity. The compute unified device architecture (CUDA) is hardware and software architecture for issuing and managing computations on the GPU. CUDA allows developers to use C as a high-level programming language.

A typical GPU computing platform based on a normal personal computer is shown in Figure 1. It mainly includes an arbitrary central processing unit (CPU) and one or more CUDA-enabled GPUs. Generally, the CPU hands the performance of parallelizable computations in applications off to the GPUs. In order to implement a parallel code on GPU with CUDA, usually four steps are needed.²⁹ First, initial works including memory space allocation should be done on host; second, the input data should be translated from host to device; third, the kernels are executed on GPU by streaming multiprocessors (SMs); and finally, translate the output data from device to host.

Figure 1.

GPU-based general-purpose computing platform.

By means of SIMD, the most common approach to realize GPU parallel computing is to decompose a problem into well-defined, thread-level work units. In CUDA, kernels are for the computing subroutines with parallel computing code, which will be concurrently performed by many CUDA threads. As shown in Figure 1, all CUDA threads are divided into three levels for management, namely threads, blocks, and grids. The threads in one block can communicate and synchronize with each other by synchronization functions. Furthermore, a multiple-levels device memory system is constructed to fully use the memory bandwidth. First, all CUDA threads can access a same global memory space. Second, threads in one block can share data through a special shared memory. Third, each thread has a private local memory and a set of registers, which provide high access speed.

Although CUDA itself is a C language environment, it is designed to work with programming languages such as Fortran,³⁰ Python,³¹ Ruby,³² and so on. In addition, MATLAB"s Parallel Computing Toolkit (PCT) supports GPU computing from the r2014 release, which makes it easier for developers to use GPUs to accelerate their applications than using C or Fortran. More than 100 built-in MATLAB functions can be directly parallel-performed on a GPU by providing an input argument of the type GPUArray, a special array type provided by PCT.³³ Furthermore, the MEX compiler in MATLAB allows software developers to call CUDA-based parallel computing libraries in MATLAB functions.

Furthermore, in order to improve the computational efficiency of the code, there are many manuals for reference, including the official guide³⁴ and the corresponding literature.^35,36

MDEGO and its parallel implementation

Concept and basic flow

Instead of the use of multiple surrogates, the MDEGO algorithm uses multiple data to provide diversity. As shown in Figure 2, it starts by sampling the initial points employing the design of experiments (DOE) technique like the Latin hypercube³⁷ function lhsdesign in MATLAB. The set of initial points is then evaluated sequentially to obtain an initial data set ( X , Y ) using a high-fidelity model, such as the finite element method (FEM). The sub-data $s_{i}$ are then extracted from the initial data set by the extraction strategy described in section “Extraction strategy of sub-data,” where i denotes the ID number of the sub-data. For every sub-data, each iteration of the algorithm starts by individually fitting a Kriging³⁸ surrogate using the extracted points. The surrogate model using sub-data $s_{i}$ then starts the infill search to find an infill point by maximizing the expected improvement. A set of infill points (up to n ) is then provided by the set of n sub-data per cycle instead of generating one single point at a time by EGO. Once determined by function evaluations, each of the infill points and its corresponding design is added to update the sub-data $s_{i}$ if it satisfies the selection criterion presented in section “Selection strategy of infill points.” The algorithm continues to restart the infill search on the augmented sampled data until the maximum number of optimization cycles is attained. At the end, the optimal design is obtained along with a set of n sub-data as well as their corresponding surrogate models for the given objective and constraint function.

Figure 2.

Flowchart of the MDEGO algorithm.

Extraction strategy of sub-data

In the MDEGO algorithm, the extraction strategy of the sub-data is improved to be suitable for multiple data. In detail, with the MDEGO algorithm, given the initial global data set ( X , Y ) with n points, the maximum number of subsets extracted from ( X , Y ) can be defined as

\begin{matrix} n_{subset} = \sum_{i = m}^{n} C_{n}^{i} \\ (m = \frac{n}{2 if n is an even number} or m = \frac{(n + 1)}{2 if n is an uneven number}) \end{matrix}

(7)

where $C_{n}^{i}$ represents the number of subsets containing i selected points from the global data. If i is taken to equal 8 and n is equal to 10, for example, then 45 (calculated by $C_{10}^{8}$ ) different subsets containing eight points can be extracted from the global data (10 points). It is clear that the more the points contained within the sub-data, the more information from the global data is used to construct surrogates, also leading to a lower number of subsets. For simplicity and robustness, this work employs the extraction strategy where i is constantly equal to the value of n−1, and then $C_{n}^{n - 1}$ (equal to n) different subsets will be generated for the purpose of diversity. It can also be interpreted that each subset is constructed by omitting only one point from the global data. As shown in Figure 2, $s_{2}$ is the sub-data extracted from the global data when the second point is taken out.

Selection strategy of infill points

In the context of the MDEGO algorithm, each sub-data is fitted by a Kriging model, and one infill point is provided by maximizing the expected improvement. The information about the distance between the infill point and the existing points is used by the selection criterion, determining whether the infill point will be added to the original global data or sub-data for the next optimization cycle. If the distance is greater than a small tolerance $ε$ , then the infill point will be used to update the original data. The distance between the ith infill point, which is generated by the surrogate of the ith sub-data, and the jth existing point from the original data is defined as

d_{ij} = \sqrt{\sum_{k = 1}^{\dim} {[xne w_{i} (k) - X_{i} (j, k)]}^{2}}

(8)

where dim denotes the dimension of the design variable and $X_{i}$ represents the ith sub-data. The selection strategy can be summarized in Figure 3, and the meanings of the parameters are listed in Table 1.

Figure 3.

Selection strategy: (a) selection strategy for sub-data and (b) selection strategy for global data.

Table 1.

Parameters for selection strategy.

Parameters	Meaning
$X_{i}$	The ith sub-data
$X$	The global data
$xne w_{i}$	The infill point generated by the ith sub-data
$d_{ij}$	Distance between the ith infill point and the jthexisting points

For simplicity, in this article, the value of the small tolerance $ε$ has been set to zero, so the infill point is only added to the original data when it does not coincide with any other existing points.

Accelerated MDEGO algorithm by parallel computing

According to the above analyses, we found that the MDEGO algorithm can achieve better results than the original MSEGO algorithm. However, it still has potential drawbacks that must be improved, especially for high-order and large-scale problems in practical applications. Since parallel computing is increasingly used in modern simulations, research on the parallel computing scheme of the MDEGO algorithm is further considered.

GPU-based parallel multi-data strategy

As mentioned above, parallelizability is a major advantage of this algorithm. For example, to a large extent, extracting and updating different sub-data can be performed independently of each other, so these calculations can be easily conducted in parallel by mapping sub-data to CUDA thread with a one-to-one relationship. That is, a CUDA thread performs a calculation of sub-data. For example, the GTX970 series GPUs used in this work have 1664 CUDA cores and can execute 1664 CUDA threads concurrently. Therefore, theoretically, 1664 sub-data can be calculated at the same time. The parallelisms of the multi-data used for our MDEGO algorithm are described as follows.

Step 1: Parallel-extract multiple sub-data $S_{i}$ from $P (X, Y)$ by a special strategy of the GPU.

Step 2: Parallel-fit a Kriging model for each sub-data $S_{i}$ and provide one infill point on the GPU by maximizing the expected improvement of each surrogate.

Step 3: Parallel-update each $S_{i}$ using the GPU with the new point if it does not superpose any existing points within this sub-data.

Step 4: The host CPU updates the global data set by the set of new points if these points do not superpose any existing points within the global data.

Step 5: Repeat steps 1–4 until the maximum number of optimization cycles is achieved.

Since a modern CUDA-enabled GPU usually contains more than 1000 CUDA cores, this fine-grained parallelism could apply to most high-order and large-scale problems.

Parallel computing of forward problems

The efficiency and stability of forward problems are important in practical engineering optimum design. Therefore, two kinds of parallel computing strategy are presented in this article to shorten the time of calculation of direct problems.

First, for function optimization problems, the direct problems can be easily expressed as data-parallel computations, and the computational granularity is small enough for GPU-based fine-grain parallel computing. As shown in Figure 4, the parallel strategy is to map the response values of direct problems to parallel processing threads, one by one, and then one CUDA thread is assigned to calculate one response value.

Figure 4.

Parallel computing of repose values based on GPU.

Furthermore, a special dynamic parallel computing method is proposed for complex polynomial functions. Take a D-dimensional function for example and let the number of initial data be $N_{s}$ , as shown in Figure 5. In the first step, a total of $N_{s} \cdot D$ CUDA threads are assigned to parallel calculation of these function values in each dimension. In the second step, a total of $N_{s}$ CUDA threads are used to assemble these dimensional values to obtain the final function values. At this point, a novel decomposition strategy is presented to reduce the computation complexity. This strategy decomposes the target polynomial function into several monomial functions and then codes kernel functions to solve these monomial functions. Through this decomposition strategy, the efficiency of GPU computing can be greatly improved because the computing load of each CUDA thread is significantly decreased.

Figure 5.

Dynamic parallel computing of a high-dimension function.

Second, the practical optimization problems usually involve many repeated full FE analyses, and the simple GPU thread-level parallelism is no longer applicable. Therefore, this article involves a hybrid parallel computing approach using open multi-processing (OpenMP) and GPU. This hybrid approach contains two stages. The first stage is the multithreads parallel computing based on a multi-core CPU using OpenMP. In other words, all of one core of a multi-core CPU is proposed for the solution of one subset, and therefore, all subsets can be solved in parallel. This parallel computing is very easy to realize. We only need to invoke subroutine calls from OpenMP thread libraries and insert the OpenMP compiler directives.³⁹ However, a simple OpenMP-based parallel computing is not powerful enough to clearly improve the performance of FE analysis. Therefore, a series of GPU-based parallel FE simulation systems are introduced as the second stage. These high-efficiency simulation systems include intellectual property rights and involve many fields of solid mechanics.^17,29,40 All in all, this hybrid parallel strategy uses one OpenMP thread to calculate one subset and control one GPU to solve one forward problem.

Flowchart of the parallel MDEGO algorithm

In this section, the flowchart of the parallel MDEGO scheme is illustrated in Figure 6. The evaluations are performed by hybrid parallel strategy with one GPU assigned to accelerate one FE analysis or function computation. In addition, the EGO iterative calculation is fine-grained, performed in parallel by the GPU.

Figure 6.

Diagram of parallel the MDEGO approach.

Numerical examples

In this section, the MDEGO algorithm is applied to both functional and automotive optimal design problems. The mathematical problems are selected to test the convergence of the proposed multiple-data approach. The examples for automotive design are used to test the computational efficiency of the parallel MDEGO algorithm.

Function tests

First, three widely used global optimization functions are selected to test the predictive capabilities of the MDEGO algorithm, including the Ackley function, the Hartman function and the Dixon & Price function. To guarantee the validity of test, the number of subsets used in the MDEGO algorithm is as large as the number of surrogates employed in the MSEGO algorithm for most cases. However, since too many surrogates will dramatically increase the computation cost for high-order functions, only three surrogates are considered for the Dixon & Price function. The detailed test method is put forward in Table 2.

Table 2.

The detailed test methods.

Function	Dimension	Number of subsets	Surrogates for MSEGO
Ackley function	2	6	Kriging, svr_Gaussianrbf, svr_polynomial,svr_exponentialrbf, svr_anovaspline-1, shepard
Dixon & Price function	2	6
Hartman function	3	6
Dixon & Price function	10	100	Kriging, svr_Gaussianrbf, shepard

MSEGO: multiple-surrogate efficient global optimization

The Ackley function

The Ackley function is described as follows

f (x) = - 20 e^{- 0.2 \sqrt{\frac{\sum_{i = 1}^{i = n} x^{2} (i)}{n}}} - e^{\frac{\sum_{i = 1}^{n} \cos [2 π x (i)]}{n}} x (i) \in [- 15, 30]

(9)

Figure 7 shows the convergence of the Ackley function. It can be seen that the MDEGO algorithm has a better convergence property than the MSEGO algorithm with the same optimization iterations. In detail, the response value of the MDEGO algorithm is always better than the MSEGO algorithm at every iterative step. More importantly, the response value of the MDEGO algorithm at the 4th iterative step is already better than the MSEGO algorithm at the 21st step. Since additional six infill points are added in every iterative step of the MDEGO algorithm, the evaluations are completely performed 126 times the same as by the MSEGO algorithm; therefore, the number of evaluations does not grow.

Figure 7.

Convergence of Ackley’s function.

Dixon & price function

The Dixon & Price Function is described as follows

f (x) = {[x (1) - 1]}^{2} + \sum_{i = 2}^{n} i [2 x^{2} (i) - x^{2} (i - 1)] x (i) \in [- 15, 30]

(10)

The convergences of the two-dimensional (2D) case and the 10-dimensional (10D) case are shown in Figures 8 and 9, respectively. It is easy to see that these conclusions are similar to Ackley’s function. The MDEGO algorithm consistently finds the global minimum quicker than the MSEGO algorithm. Note as well that there is less difficulty in the calculation of multiple initial points than in the construction of multiple surrogates, so the computation cost for these functions is effectively reduced.

Figure 8.

Convergence of 2D Dixon & Price function.

Figure 9.

Convergence of 10D Dixon & Price function.

Hartman function

Finally, the Hartman function is considered as follows

f (x) = - \sum_{i = 1}^{4} c_{i} \exp [- \sum_{j = 1}^{3} a_{ij} {(x_{j} - p_{ij})}^{2}] x (i) \in [0, 1]

(11)

The convergence of the six-dimensional (6D) case is shown in Figure 10. Unlike the above conclusions, the response values of the MDEGO algorithm are not always better than the MSEGO algorithm during the iterations. In the beginning, the convergence of the MSEGO algorithm is better than the MDEGO algorithm, and then as convergence is approached, a better response value is finally obtained by the MDEGO algorithm. Moreover, the MSEGO algorithm does not achieve the optimal value at the same point as the MDEGO algorithm.

Figure 10.

Convergence of the 6D Hartman function.

Automotive optimal design optimization

In this section, the parallel MDEGO algorithm is applied to two practical engineering optimization problems. As mentioned above, our GPU-based FE analysis systems are introduced to improve calculation efficiency. These comparisons are performed on a personal computer containing one Intel Core i7-930 CPU and two NVidia GTX 970 GPUs. The GTX970 are equipped with 1664 CUDA cores running at 1.114 GHz and sharing 4 GB of global memory. Furthermore, the parallel program is written in CUDA, version 7.0, and the NVIDIA GPU driver version is 347.62.

Automobile B-pillar

First, an optimization example for an automobile B-pillar is performed with the target function being the three-point bending stiffness of the part. The FE model of this optimization problem is shown in Figure 11. The FE model is composed of 2472 nodes and 7118 elements, involving 42,708 degrees of freedom (DOFs). The modulus of elasticity is E = 200 GPa, and Possion’s ratio is v = 0.3. The thickness of the car body is t = 1.0 mm. As shown in Figure 11, concentrated force with the value of 600 N is applied vertically downward on the red node, and all DOFs of the green nodes are constrained. The downward concentrated forces are loaded on the red nodes, and the simply supported constraints are loaded on the green nodes.

Figure 11.

FE model of the automobile B-pillar.

As shown in Figure 12, the optimization goal is to get the optimal three-point bending stiffness by modifying the shape of its section lines. The shape of these section lines is controlled by several key points, including point A and point B; therefore, this case takes the position of using points A and B as the design variables. The Ω_A and Ω_B regions in Figure 12 are the design spaces where A and B are respectively located.

Figure 12.

Optimal model and its optimal result: (a) section line and (b) optimized region and optimal result.

In this case, six initial points are calculated in the MDEGO algorithm, and three surrogates are constructed in the MSEGO algorithm, including Kriging, svr_Gaussianrbf, and shepard. The convergence of the parallel MDEGO algorithm and the MSEGO algorithm for this case is shown in Figure 13. The optimal result of the MDEGO algorithm is clearly much better than that of the MSEGO algorithm. The final optimal shape line is shown in Figure 12.

Figure 13.

Comparison for an automobile B-pillar.

In addition, in order to confirm the calculation efficiency of the parallel MDEGO algorithm, the original serial MDEGO algorithm is performed. According to the performances, a 31-fold increase in speed is obtained, where the increase in speed is calculated by dividing the CPU consumed time for the original serial process by the total GPU consumed time for the parallel process.

Automobile door lightweight design

In this case, a design problem for a lightweight automobile door is considered. A lightening hole is drilled to decrease the weight. The FE model is shown in Figure 14. The FE model includes 86,895 nodes and 170,282 elements that involve 521,370 DOFs. The boundary condition is described as follows: restrain the translational motion along x, y, and z directions for nodes at the location of the hinges and restrain the translational motion along y direction for the node at the location of the key cylinder. A force with the value of 800 N is applied vertically downward on the node at the location of the key cylinder. The modulus of elasticity is E = 200 GPa, and Possion’s ratio is v = 0.3. The thickness of the car body is t = 1.5 mm. The optimization goal is to maximize the stiffness and minimize the weight by modifying the diameter of the lightening hole.

Figure 14.

FE model of automobile door.

Similarly, the convergence of the MDEGO algorithm and the MSEGO algorithm is compared in Figure 15. The MDEGO algorithm still shows better convergence than the MSEGO algorithm. Furthermore, a 95-fold increase in speed is obtained for this case, because the FE model is more complex than the B-pillar case.

Figure 15.

Comparison for lightweight automobile door design.

Conclusion

In this article, a multiple-data-based optimization method and its parallel implementation are proposed for global optimization problems. In terms of the accuracy for various mathematical test functions, the MDEGO algorithm excels in all categories. Furthermore, since the ultimate purpose of this work is to improve the usability of the global optimization technique for automotive body design, a B-pillar and a car door optimal design problem are treated. The optimal results demonstrate that, compared to the MSEGO algorithm, the performance of the suggested MDEGO algorithm is better suited for both mathematical and practical engineering problems. More importantly, according to the inherent parallelism of the MDEGO algorithm, a parallel GPU-based MDEGO algorithm is proposed. This method uses OpenMP to perform parallel calculate of each sub-data and uses the GPU to perform parallel calculate of response values. Among them, GPU-based parallel computing uses a fine-grained parallel computing strategy by mapping CUDA threads to sub-data. At the same time, for the engineering optimization problem, this article also uses GPU to accelerate the calculation of positive problems. Examples show that the calculation time for the practical problems are clearly shortened by a hybrid OpenMP and CUDA parallel computing scheme proposed for the MDEGO algorithm. An increase in speed of nearly 31-fold is obtained for the simple B-pillar optimal design problem and an increase of approximately 100-fold in speed is obtained for the automobile door lightening case.

The remarkable advantages of this parallel MDEGO algorithm mainly include: (1) this algorithm makes full use of the diversity of array grouping, and it has higher convergence speed than MSEGO algorithm based on multiple models and (2) this algorithm is very suitable for parallel computing. It can increase the calculation speed while ensuring the convergence speed, so that the algorithm can be applied to most high-order and large-scale problems.

In the near future, we will investigate other more complex cases by hybrid message passing interface (MPI), OpenMP, and CUDA, such as performing the optimization of an entire vehicle body structure with the MDEGO algorithm in a short time.

Footnotes

Handling Editor: Michal Kuciej

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China under grant number 11702090 and the Project of the Key Programme of the National Natural Science Foundation of China under grant number 61232014. The Project is supported by the State Key Laboratory of Materials Processing and Die & Mould Technology.

ORCID iD

Yong Cai

References

Word

Kang

Akesson

et al . Efficient parallel solution of large-scale nonlinear dynamic optimization problems. Comput Optim Appl 2014; 59: 667–688.

Yang

Wang

Yang

et al . Multiobjective optimization for fixture locating layout of sheet metal part using SVR and NSGA-II. Math Probl Eng 2017; 2017: 7076143.

Koziel

Ciaurri

Leifsson

. Surrogate-based methods. In: KozielS and Yang XS (eds) Computational optimization, methods and algorithms. Berlin: Springer, 2011, pp.33–59.

Goel

Haftka

Shyy

et al . Ensemble of surrogates. Struct Multidiscip O 2007; 33: 199–216.

Viana

Haftka

Watson

T. Why not run the efficient global optimization algorithm with multiple surrogates. In: Proceedings of the 51th AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics, and materials conference, Orlando, FL, 12–15 April 2010. Reston, VA: AIAA

Khuri

Cornell

. Response surfaces: designs and analyses. Boca Raton, FL: CRC Press, 1996.

Durantin

Marzat

Balesdent

. Analysis of multi-objective Kriging-based methods for constrained global optimization. Comput Optim Appl 2015; 63: 903–926.

Luo

Shimoyama

Obayashi

. A study on many-objective optimization using the Kriging-surrogate-based evolutionary algorithm maximizing expected hypervolume improvement. Math Probl Eng 2015; 2015: 162712.

Gertz

Griffin

. Using an iterative linear solver in an interior-point method for generating support vector machines. Comput Optim Appl 2010; 47: 431–453.

10.

Koziel

Bandler

Madsen

. A space-mapping framework for engineering optimization—theory and implementation. IEEE T Microw Theory 2006; 54: 3721–3730.

11.

Jakumeit

Herdy

Nitsche

. Parameter optimization of the sheet metal forming process using an iterative parallel Kriging algorithm. Struct Multidiscip O 2005; 29: 498–507.

12.

Jansson

Andersson

Nilsson

. Optimization of draw-in for an automotive sheet metal part: an evaluation using surrogate models and response surfaces. J Mater Process Technol 2005; 159: 426–434.

13.

Liao

Yang

et al . Multiobjective optimization for crash safety design of vehicles using stepwise regression model. Struct Multidiscip O 2008; 35: 561–569.

14.

Jones

Schonlau

Welch

. Efficient global optimization of expensive black-box functions. J Global Optim 1998; 13: 455–492.

15.

Viana

Haftka

Watson

. Efficient global optimization algorithm assisted by multiple surrogate techniques. J Global Optim 2013; 56: 669–689.

16.

Ahmed

Goh

RSM

Khoo

et al . Implementation of the Lorentz–Drude Model incorporated FDTD method on multiple GPUS for plasmonics applications. Int J Comput Meth 2014; 11: 1350063.

17.

Cai

Wang

et al . A high performance crashworthiness simulation system based on GPU. Adv Eng Softw 2015; 86: 29–38.

18.

Barkalov

Gergel

. Parallel global optimization on GPU. J Global Optim 2016; 66: 3–20.

19.

Barkalov

Gergel

Lebedev

. Solving global optimization problems on GPU cluster. AIP Conf Proc 2016; 1738: 400006.

20.

Wan

Chi

et al . An efficient fine-grained parallel particle swarm optimization method based on GPU-acceleration. Int J Innov Comput I 2007; 3: 1707–1714.

21.

Pospichal

Jaros

Schwarz

. Parallel genetic algorithm on the CUDA architecture. In: Frade

De Vega

Cotta

(eds) Applications of evolutionary computation. Berlin: Springer, 2010, pp.442–451.

22.

Fabris

Krohling

. A co-evolutionary differential evolution algorithm for solving min–max optimization problems implemented on GPU using C-CUDA. Expert Syst Appl 2012; 39: 10324–10333.

23.

Kan

Liang

et al . Accelerating the SCE-UA global optimization method based on multi-core CPU and many-core GPU. Adv Meteorol 2016; 2016: 8483728.

24.

Ferreiro

García

López-Salas

et al . An efficient implementation of parallel simulated annealing algorithm in GPUs. J Global Optim 2013; 57: 863–890.

25.

D’Apuzzo

Marino

Migdalas

et al . Parallel computing in global optimization. In: Kontoghiorghes

(ed.) Handbook of parallel computing and statistics. Boca Raton, FL: CRC Press, 2005, pp.241–274.

26.

Wei

Kruis

. GPU-accelerated Monte Carlo simulation of particle coagulation based on the inverse method. J Comput Phys 2013; 249: 67–79.

27.

Han

Hipwell

Taylor

et al . Fast deformation simulation of breasts using GPU-based dynamic explicit finite element method. Digit Mammo 2010; 40: 728–735.

28.

Guan

Yan

Jin

J-M

. An accurate and efficient finite element-boundary integral method with GPU acceleration for 3-D electromagnetic analysis. IEEE T Antenn Propag 2014; 62: 6325–6336.

29.

Cai

Wang

et al . Development of parallel explicit finite element sheet forming simulation system based on GPU architecture. Adv Eng Softw 2012; 45: 370–379.

30.

Wolfe

. Implementing the PGI accelerator model. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units, Pittsburgh, PA, 14 March 2010, p.43–50. New York: ACM.

31.

Klöckner

Pinto

Lee

et al . PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation. Parallel Comput 2012; 38: 157–174.

32.

Masuhara

Nishiguchi

. A data-parallel extension to Ruby for GPGPU: toward a framework for implementing domain-specific optimizations. In: Proceedings of the 9th ECOOP workshop on reflection, AOP, and meta-data for software evolution, Beijing, China, 13 June 2012, p.3–6. New York: ACM.

33.

Reese

Zaranek

. GPU programming in MATLAB. Natick, MA: Mathworks News & notes, pp.22–25, 2012.

34.

NVidia

. C best practices guide. Santa Clara, CA: NVIDIA, 2012.

35.

Che

Boyer

Meng

et al . A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distr Comput 2008; 68: 1370–1380.

36.

Ryoo

Rodrigues

Stone

et al . Program optimization carving for GPU computing. J Parallel Distr Comput 2008; 68: 1389–1401.

37.

Helton

Davis

. Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliab Eng Syst Safe 2003; 81: 23–69.

38.

Stein

. Interpolation of spatial data: some theory for Kriging. Berlin: Springer, 1999.

39.

Wang

. Parallel boundary and best neighbor searching sampling algorithm for drawbead design optimization in sheet metal forming. Struct Multidiscip O 2010; 41: 309–324.

40.

Cai

Wang

. A parallel node-based solution scheme for implicit finite element method using GPU. Proc Eng 2013; 61: 318–324.

A multiple-data-based efficient global optimization algorithm and its parallel implementation for automotive body design

Abstract

Keywords

Introduction

EGO algorithm

General-purpose computing on GPU

MDEGO and its parallel implementation

Concept and basic flow

Extraction strategy of sub-data

Selection strategy of infill points

Accelerated MDEGO algorithm by parallel computing

GPU-based parallel multi-data strategy

Parallel computing of forward problems

Flowchart of the parallel MDEGO algorithm

Numerical examples

Function tests

The Ackley function

Dixon & price function

Hartman function

Automotive optimal design optimization

Automobile B-pillar

Automobile door lightweight design

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References