Abstract
The hybrid atomistic-continuum coupling method based on domain decomposition serves as an important tool for the micro-fluid simulation. There exists a certain degree of parallelism load imbalance when directly using the USHER algorithm in the domain decomposition–based hybrid atomistic-continuum coupling method. In this article, we propose a grid-based parallel algorithm for particle insertion, named WI-USHER, to improve the efficiency of the particle insertion operation when restricting the size of the region to be inserted or with higher number density. The WI-USHER algorithm slices the region to be inserted into finer grids with proper spacing scale, marks parts of finer grids in black according to three exclusive rules, that is, Single Particle Occupation (SPO), Single Particle Coverage (SPC), and Multi-Particles Coverage (MPC), and finds the target insertion point in the remained white grids. We use two test cases to show the superiority of our WI-USHER algorithm over the USHER algorithm. The WI-USHER algorithm performs lower averaged force evaluation times, which decreases from
Introduction
In recent years, with the rapid development of nano-technology, micro/nano-scale devices such as micro-electromechanical system (MEMS) devices and lab-on-a-chip devices have been widely used. The fluid flows in these devices involve a broad range of scales from the atomistic scale to the macroscopic scale.1–3 Simulation of such devices, that is, MEMS and lab-on-a-chip, cannot be carried out relying solely on the continuum models, in particular, the most commonly used Navier–Stokes (NS) equations. The NS equations become generally invalid when the continuum or thermal equilibrium condition breaks down. Molecular techniques, such as molecular dynamics (MD), can describe materials from the nanoscopic perspective and model the physics accurately at these scales. However, due to the computationally expensiveness, they cannot be directly used as a design tool for engineering nano- and micro-scale fluidic systems. In order to investigate these phenomena, researchers usually use the multi-scale modeling methods. The advantages of the multi-scale modeling lie in the ability of revealing properties of multi-scale systems by capturing phenomena that appear a wide range of time and length scales which beyond any single solver and method. 4 Especially, the hybrid atomistic-continuum method (HAC) 5 based on geometric coupling has seen rapid development since it allows on-the-fly information exchange and interaction between multiple simulation regions. The HAC method based on the geometric coupling obtains both simulation accuracy and efficiency, which calculating the majority of the domain by fast continuum solvers (computational fluid dynamics (CFD)) which are several orders of magnitude faster than molecular simulation methods (MD) can speed up the computation tremendously.
The HAC method splits the simulation domain into the continuum region, the atomistic region, and the overlap region, which exchanges the simulation data between the previous two regions to keep the physical properties consistently, that is, density, velocity, temperature, and so on. The top of the overlap region is the junction of the atomistic region and the continuum region, which can be regarded as an open MD simulation boundary. The MD simulation of the open boundary needs to consider the mass, momentum, and energy transfer, which is also an important feature of the simulation of non-equilibrium problem using the coupling method. One of the biggest challenges faced is the problem of the efficient insertion of particles on the open boundary in the dense liquids. 6 To deal with the on-the-fly exchanging matter with surroundings, researchers focus on finding the efficient insertion of particles in dense liquids.
In previous studies, the most famous particle insertion algorithm is the USHER algorithm6,7 for single particle insertion in dense fluid. For single particle insertion, the USHER algorithm randomly chooses the initial position, performs an iterative search method, which is similar to the Newton–Raphson method to find the target point based on the local potential landscape. It is default a sequential algorithm. After this, the improved algorithm of particle insertion algorithm is very little until year 2014. The FADE mass-stating technique 8 uses a weighting function to insert and delete particles with a relaxation time. It inserts particles when the weight of the particle becomes 1 and deletes particles when becomes 0. The FADE mass-stating suits for the single particle, molecular, and even polymer. The performance of the FADE mass-stating technique matches the USHER algorithm and it is a parallel algorithm.
Nevertheless, there are still some issues to be investigated when directly using the USHER algorithm in the simulation. First, in the actual test, when the number density of the region to be inserted is larger than 0.8, the performance of the USHER algorithm decreases fast. The algorithm may not find the target position matching the threshold when reaching the maximum iteration times. Second, when we apply the USHER algorithm into the HAC coupling framework 9 because of the limitation of the insertion volume, the USHER algorithm appeals the same deficiency performance as the high value of number density. Third, there will be multiple regions to be inserted particles into in the parallel HAC simulation. After each particle insertion operation, the whole simulation system has to update the total particle number to each process. Different insertion performance of each process results in long-time synchronization barrier. The imbalance of parallel insertion operation leads decreasing in parallel simulation performance. Therefore, there is an urgent need to improve the performance of the particle insertion operation.
In this article, focusing on the computational efficiency of the particle insertion operation in the hybrid coupling simulation, we propose a grid-based parallel particle insertion algorithm modified from the famous insertion algorithm USHER. 6 Specifically, the main contributions of this article are summarized as follows:
We analyze the deficiency of the original USHER algorithm, formalize the problem of particle insertion, and put forward the solution to the deficiency of the particle insertion.
We propose the WI-USHER parallel algorithm for the HAC method. The algorithm slices the region to be inserted into finer grids, marks some grids black according to three exclusive rules, that is, SPO, SPC, and MPC, and uses the steepest descent method to find the target position in the remained white grids.
We implement the WI-USHER algorithm in our HAC coupling framework based on OpenFOAM
10
and LAMMPS.
11
Comparing with the USHER algorithm under the simulation of the number density from slightly high value to high value, the averaged force evaluation times of the WI-USHER algorithm decreases an order of magnitude from
The remainder of the article is organized as follows: section “Preliminary” reviews the HAC simulation methodology and the particle insertion algorithm. Section “Basic idea of White Initial USHER parallel algorithm” uses two cases to present the deficiency of the original USHER algorithm, formalizes the particle insertion problem, and proposes the basic idea of our WI-USHER algorithm. Section “Algorithm and implementation” presents the key exclusive rules of the WI-USHER parallel algorithm and gives the design and implementation of the algorithm based on OpenFOAM and LAMMPS. Section “Experiments” evaluates the efficiency of our WI-USHER compared with the USHER algorithm using two benchmark cases. Finally, Section “Conclusions” concludes the article.
Preliminary
In this section, we review the HAC coupling method and framework based on domain decomposition and the existing algorithm of particle insertion operation.
Hybrid coupling computational framework
With the rapid development of high-performance computing (HPC), the simulation ability of classical MD is extended but still cannot fulfill very large-scale particle simulation application. HAC method limits the use of MD simulation to regions where the atomistic scales needed to be resolved, while using a continuum-based solver for the remainder of the domain, which obtain both simulation accuracy and efficiency.
The first hybrid method combining MD with the continuum method for dense fluid was proposed by O’Connell and Thompson. 5 For the two-dimensional (2D) Couette flow, the domain was split into an atomistic region and a continuum region, using an overlap region to alleviate dramatic density oscillation and couple the results of these two regions. The outer boundary of the continuum region which resides in the atomistic region is called hybrid solution interface (HSI). Later, HAC models emerged differing in forms of coupling strategies, boundary conditions extraction, and non-periodic boundary force models.4,12,13 We name the atomistic region and the overlap region unified as the particle region for that MD simulation performing on both of these two regions. We use the abbreviation “A” for the “atomistic” and “C” for “continuum.” Generally, the overlap region contains a control region (control region), a buffer region (buffer region), an atomistic-coupled-to-continuum-region (A → C region), and a continuum-coupled-to-atomistic region (C → A region). In the control region, there is a reflecting wall boundary condition to prevent particles leaving the atomistic region freely and the outer boundary is the MD–continuum interface. The schematic diagram of the domain decomposition–based HAC method is depicted in Figure 1.

Domain decomposition of benchmark case, channel flow past the rough wall and detailed of the overlap region. “A” is the abbreviation for “atomistic,” while “C” is for “continuum.” Yellow dots represent fluid particles and black dots for solid wall particles.
In order to facilitate data exchanging between the continuum solver and the atomistic solver, the overlap region is sliced into bins to match the cells in the continuum region as shown in Figure 2. There are four layers of bins which are corresponding to the control region, C → A region, buffer region, and A → C region. In the continuum region, we use the incompressible NS equation for modeling the system. The NS equation and the continuity equation are given by
where

Diagram of particle insertion occurring on the atomistic-continuum interface.
In the particle simulation, we use the truncated and shifted Lennard–Jones (LJ) potential to model the interactions between fluid particles and wall particles. The potential is given by
where
The parallel hybrid coupling framework based on the domain decomposition consists of the continuum solver (CFD solver), the atomistic solver (MD solver), and relatively coupling operations. The parallelization of the continuum part is based on mesh decomposition, while the atomistic part is on simulation domain decomposition to allocate sub-mesh and sub-domain on each process. Reasonable parallel coupling operations play an important role on efficient coupling simulation. There are also different multi-scale coupling methods, that is, finite volume method and lattice Boltzmann method, 14 finite volume method, and Brownian configuration method. 15 The existing coupling method frameworks include MaMiCo. 16
Particle insertion algorithm
In order to exchange mass flux with the open boundary, the USHER algorithm6,7 uses a Newton–Raphson liked method to iteratively find the target position according to the local potential landscape. Many existing research deals with the mass flux using this method.17,18 The target position of a new inserted particle is where the potential is equal to the average potential of the whole region to be inserted or within a pre-set threshold of the average potential. In practice, the pre-set threshold is usually a small value, that is, 0.001. One attempt of insertion involves first keeping all the other particles in the region frozen in space and time. A new trial particle is then inserted randomly in the region and is shifted away from the overlapping core of resident particles in an iterative manner, guided by the strength and slope of the static potential energy landscape. Once the target position is found, the new particle is assigned with velocity drawn from the Maxwell-Damon distribution according to the temperature of the whole region.
In the HAC method based on the domain decomposition, the MD–continuum interface needs the particle insertion algorithm to exchange mass flux between the continuum region and the atomistic region. When applying the USHER algorithm into the coupling method, a control region should be introduced to complete this operation as depicted in Figure 1. Furthermore, in order to complete data exchanging, the overlap region has to be sliced into bins to match the mesh cells of the continuum region as shown in Figure 2.
In previous studies, for the mass exchange of the open boundary, the conventional method is to convert the mass flux into the number of particles using a converted equation and utilize the insertion and deletion particles of the MD region to characterize the exchange of mass flux. In one CFD time-step, the number of particle to be inserted or deleted is calculated using equation (4)
where
When continuously inserting particles into the region, the USHER algorithm needs to recalculate the region energy every insertion, because of that after each new insertion, the local potential landscape changes with every insertion. Borg et al. 8 point out that the main disadvantages of the USHER algorithm are that it requires the correct selection and optimization of the input parameters to minimize computational cost and maximize insertion rate, but they do not further investigate these disadvantages.
In this article, we modify the USHER algorithm to apply it into our HAC coupling framework, 9 analyze the deficiency of the USHER algorithm, and make the particle insertion operation more efficiency for the coupling simulation.
Basic idea of White Initial USHER parallel algorithm
In this section, we present the basic idea of our WI-USHER parallel algorithm. Subsection “Deficiency of the USHER algorithm” uses two cases to reveal the deficiency of the original USHER algorithm. Section “Formulation of particle insertion algorithm” formalizes the particle insertion problem. Section “WI-USHER parallel algorithm” gives the basic idea of the exclusive method.
Deficiency of the USHER algorithm
The authors of the USHER algorithm choose the initial start position totally randomly. In order to test the insertion efficiency of the USHER algorithm, Delgado-Buscalioni and Coveney 6 choose constantly inserting into a periodic box. In this test case, the volume of the simulation domain keeps unchanged, there is no external thermo bath, and the motion of Argon particles is governed by Newtonian equation. We initialize the same periodic box 10 times with different random numbers. The pre-set threshold is set as 0.001 and the maximum iteration times of iteratively searching are set as 5 × 104. The initial number density is 0.8 and we insert new particles into the periodic box with a constant insertion rate until the number density reaches 1.0. The algorithm will reach the maximum iteration times when it cannot find the position with potential that satisfies the threshold. The force evaluation times needed by each insertion is categorized in Table 1.
Distribution of force evaluation times with maximum iteration times 5 × 104.
From Table 1, we find that the force evaluation times that reaches the maximum iteration times is with the probability of 36.5% as the number density of the inserted region increases. The force evaluation times that reaches the maximum number means the USHER algorithm does not find the target position meeting the threshold but find sub-optimal position. Moreover, in order to exclude the effect of the maximum iteration times, we choose another pre-set maximum iteration times 1 × 106. The test results are shown in Table 2.
Distribution of force evaluation times with maximum iteration times 1 × 106.
Due to a different maximum iteration times, when one insertion attempt reaches the maximum, there will be different sub-optimal positions among different maximum times, and later insertions will be slightly different. We find that even increasing the maximum iteration times, in some insertion attempts, it cannot find the optimal position yet.
Especially, the deficiency of the USHER algorithm becomes worse when applied into the HAC method. During the parallel coupling simulation, there would be multi-regions to be inserted as the same insertion operation. After each coupling operation, all the processes have to be synchronized because of the variation of total particle number. If the USHER algorithm in one of the inserted regions reaches the maximum iteration times, the other processes have to wait until this process finishing iteration. During the whole coupling simulation, insertions probably happen on different process. Parallel insertion imbalance seriously impacts the parallel performance. The schematic of parallel insertion and synchronization is depicted in Figure 3 with four processes (Q0, Q1, Q2, and Q3).

Schematic diagram of parallel particle insertion and barrier with the degree of parallelism 4. Taking t to
We take a HAC case with 12 degree of parallelism as an example to depict the parallel insertion imbalance. The continuum region is decomposed into 4 × 3 sub-meshes and the particle region, that is, the atomistic region and the overlap region, is decomposed into 4 × 3 sub-domains. The parallel partition of the particle region is shown in Figure 4, which only process P2, P5, P8, and P11 would perform the particle insertion operation, named as inserted processes.

Schematic diagram of processes distribution in the atomistic region (gray) and the overlap region (light green) with the degree of parallelism 12.
The whole simulation time is 8 ns. The time step of MD is 0.005τ and the time step of CFD is 100 times the MD time step. The total time steps of MD is 8 × 105 and CFD is 8 × 103. The total computation time includes three parts: the CFD computation time
Time distribution of inserted processes.
Time distribution of each procedure.
From Tables 3 and 4, we find that the particle insertion operation time processes the main part of the coupling operation time due to unsuccessful insertions. Because only part of processes involved in the particle insertion, the long-time insertion and barrier of these processes destroy the parallel performance of the hybrid simulation.
We have discussed the deficiency of the USHER algorithm using two bench cases. The USHER algorithm that totally chooses the initial position randomly does not perform perfectly, but limits the parallel efficiency of the coupling simulation. Therefore, there is an urgent need to improve particle insertion performance.
Formulation of particle insertion algorithm
The present particle insertion algorithms are mostly based on the potential distribution in the region to be inserted and use the evolution of the local potential to iteratively find the appropriate position. In the USHER algorithm, the potential energy of the newly inserted particle is approximately equal to that of the excess potential energy per particle u the system, that is,
After continuously inserting into the periodic box, we find that the final target position is a position that the distances between the newly particle and surrounding particles are proper. This proper means there is an empty sphere that consists only the newly introduced particle. Under the hard sphere assumption,
19
the particle is presented with the center of the sphere
Therefore, if there is a sufficient large empty sphere in the region to be inserted, there will a target position in such an empty sphere that meets the requirement of the particle insertion algorithm that has no damage on the statistical features. We give such a lemma of the newly introduced particle.
Lemma 1
If there is sufficient large empty sphere in the region to be inserted, there must have such a position in the empty sphere that meets the potential requirement. On the contrary, if there is no such an empty sphere, under such configuration of the region to be inserted, there will have no target position.
In order to find a target position in a point source field, we can reference to the solution of finding the Largest Empty Sphere (LES) in three-dimensional (3D) (Largest Empty Circle (LEC) in 2D). The problem of finding the LES can be solved using Voronoi diagram. 20
LES problem
Given a set
where
For 2D LEC problem, the time and space complexity of using Voronoi diagram are
WI-USHER parallel algorithm
In the previous section, we formalize the problem of particle insertion and find that we can choose the newly particle position only according to the configuration of the existing particle positions. But, in 3D, the complexity of space and time is sufficiently high. Taking a system of 800 particles as an example, the complexity of finding a empty sphere matching the threshold is
As we know, the original USHER algorithm chooses the initial position totally randomly and not refers to the configuration of particles in the system. Therefore, we combine the idea of finding the LES and some exclusive rules, which exclude some parts of the region that exist no target positions and use the local potential landscape to find the target position in the remaining parts.
It is the starting point of the WI-USHER algorithm that eliminates those spatial region existing no target points, named impossible region and uses the remained white initial region to search the target. Through profiling the configuration of existing particles, the WI-USHER algorithm combines the particle effective sphere as shown in Figure 5 and excludes the impossible region. The effective sphere represents the ability of one particle marking the inserted region. For there is difficulty of describing the effective sphere in the algorithm, we use the finer grids method that slices the region to be inserted into multi finer grids. Each particle has its own belonging grid and the effective sphere that can cover the neighbor grids. Multi-particles have the union coverage of finer grids to further exclude impossible region. The detailed definition of the effective sphere is presented in section “Algorithm.”

Schematic diagram of the effective sphere for single particle. The grids that are not marked in black are the White initials for the WI-USHER algorithm.
We can find that in order to describe the configuration of the particles position information, we must introduce some data structures to maintain these information. If any inserted process reaches the maximum iteration times, it will delay the overall performance of the parallel simulation, so it is feasible to exchange the time efficiency method through spatial storage.
Algorithm and implementation
In this section, we present the detailed design of our WI-USHER algorithm and the implementation in our hybrid coupling framework. Section “Algorithm” gives the parameters involved in the algorithm and section “Data maintenance and implementation” presents the maintenance data info and the implementation in our hybrid coupling framework. We choose the liquid argon as the model fluid. The reason for selecting argon as the fluid for simulation is that the particles have well-established atomic potential interaction and it serves as a reference point for more complex fluid simulations. We choose the shifted LJ potential, whose parameters are
Algorithm
The key idea of the WI-USHER algorithm is to eliminate those impossible regions and to guide the location of the target points may exist more inclined. The algorithm slices the region to be inserted into finer grids. The direction of building finer grids is to slice each particle into one grid. The exclusive rules of the WI-USHER algorithm are SPO, SPC, and MPC which are discussed in the following sections.
Single particle occupation
In this article, we take the liquid argon as the model liquid and the LJ potential as the pair interaction potential as shown in Figure 6. From the distribution of the potential, we can see that under the hard sphere approximation, each particle has its reasonable radius and particles cannot get too close to each other. We give the definition of the healthy distance when the particle pair distance is larger than

Lennard–Jones 12-6 potential.
The spatial selection of finer grids is properly chosen that each particle belongs to only one grid. The first exclusive rule is SPO, which means the grid that one particle belongs to would not exist the target position. The procedure of the WI-USHER algorithm is slicing the region to be inserted to finer grids as shown in Figure 7 and then marking the grids in black that have particle belongs to as shown in Figure 8.

Initial finer grids for the inserted region, each finer grid occupies only one particle with proper spatial slicing.

Single Particle Occupation exclusive rule, each grid that occupied by the particle should be marked in black.
Single particle coverage
Based on the assumption of the hard sphere, each particle has its effective diameter. Therefore, we use the
In 3D problem, the measured distance is the largest distance between the eight corners of the grid and the particle. The procedure of the WI-USHER algorithm goes on to mark black to those grids within the coverage of a single particle, that is, the second exclusive rule SPC as shown in Figure 9.

Single Particle Coverage exclusive rule, each grid within one of the effective sphere should be marked in black.
Multi-particles coverage
In the previous two sections, we exclude grids according to the SPO and the SPC. Next, we discuss the union coverage of multi-particles to mark grids black.
Due to the effective coverage of a single particle in a sphere, the overlap of one sphere and one grid is measured whether the eight corners of the grid are in the sphere or not as shown in Figure 10. If one grid is covered by multi-particles to a certain value, it will be marked in black, that is, the third exclusive rule MPC.

Multi-Particles Coverage exclusive rule, each grid that covered by the multiple effective sphere to a certain value should be marked in black.
The detail of this overlapping measurement is described as follows. The proper spatial parameter of finer grids that we choose is contained only one particle in each grid and the largest distance between the center of one grid to the eight corners of the neighbor grid is larger than the effective radius
There are three situations that one gird covered by one particle effective coverage, we deal with them separately:
Totally covered. If the largest distance between the eight corners and the particle is smaller than
Totally uncovered. If the smallest distance between the eight corners and the particle is larger than
Partial covered. The largest distance between the eight corners of the grid and the particle is larger than
The pseudo-code of the particle coverage is shown in Algorithm 1.
After these three rounds of exclusion, the region that may exist target positions is narrowed to nearly 30% or less of the original region to be inserted. The WI-USHER algorithm will find the target position in the remained white initial grids. In the remained white initial grids, the WI-USHER algorithm will tally each of these grids, by finding a position in some grids with the potential equals to the average potential of the inserted region. The initial insertion point in each grid is randomly chosen and updated by Newton–Raphson method that only goes down the direction of the landscape of the energy potential in the inserted region. When the energy potential goes up, the WI-USHER algorithm will pass this grid and go on tally the other grids until finding the target position with the desired energy as the same method in Delgado-Buscalioni and Coveney. 6
Data maintenance and implementation
When the insertion happens during the simulation, particles in the region to be inserted are varied each time, so the relationship between particles and finer grids is needed to recalculate each insertion. The WI_USHER algorithm needs certain data structures to maintain the relationship. In our hybrid coupling framework, we implement the parallel version of the WI-USHER algorithm to meet the simulation requirement.
Definition of data maintenance
The WI-USHER algorithm needs to define the spatial info of finer grids and the neighbor relationship. In each insertion attempt, we should profile the following messages:
The relationship between particles and the region to be inserted: in the region to be inserted or within the cutoff of the region to be inserted;
The relationship between particles and finer grids: particle numbers and particle positions in each grids;
The marking information of finer grids.
After profiling the above messages, the WI-USHER algorithm uses these configuration info to exclude region to be inserted and improve the performance of the particle insertion operation.
Implementation and parallelized design
In this article, we implement our WI-USHER algorithm in the HAC software framework based on OpenFOAM and LAMMPS. OpenFOAM serves as the main solver and LAMMPS is used as a dynamic library to be called from OpenFOAM. The solver is an incompressible, single phase, Newtonian solver to accomplish the coupling simulation. The WI-USHER algorithm is implemented as a fix style command in LAMMPS. As there are multi-regions to be inserted during the coupling simulation, we need to deal with the newly created particles overlapping between neighbor processes as shown in Figure 11.

Diagram of neighbor process overlapping checking, only checking with neighbor who also has particle insertion operation.
In the coupling simulation, the degree of parallelism is
The detailed procedure of the WI-USHER algorithm in depicted in Algorithm 3.
Experiments
We demonstrate the superiority of the WI-USHER algorithm over the original USHER algorithm using two test cases and also list the worst case of our WI-USHER algorithm. Section “Platforms” introduces the platform we used, section “Periodic box” shows the analysis and results of the first case with periodic boundary condition, section “Channel flow past the rough wall” gives the analysis and results of the second case with rough boundary condition, and section “Worst case” presents the worst case of our modified algorithm.
Platforms
In the experiments, we use a HPC cluster situated in the State Key Laboratory of High Performance Computing. 24 This computing platform consists of hundreds of computing nodes and each node contains 12 Intel Xeon 2.1 GHz E5-2620 CPU cores and a total memory of 16 GB. We implement our HAC coupling framework using OpenFOAM, which is an excellent CFD open-source software with C++ object-oriented class libraries, and LAMMPS, which is a classical MD simulation code.
Periodic box
The first test case is a periodic box with liquid argon as mentioned in Delgado-Buscalioni and Coveney,
6
whose length is
The test case is with no external thermostat to prevent the temperature of the system increasing dramatically due to improper insertion. The insertion rate is kept at a small rate, so the system can relax to equilibrium after one insertion. Under the assumption of the hard sphere, the insertion rate should be smaller than the particle collision time, which is
The input parameters of the WI-USHER algorithm are as follows: the effective coverage radius
USHER: no finer grids, only set the maximum iteration times;
SPO: consider SPO of the grid, vary the grid spatial scale
SPC: consider single particle effective coverage of the grid, vary the grid spatial scale
MPC: consider the multi-particles effective coverage of the grid, vary the grid spatial scale
Input parameters of four test groups.
Especially, we also set five kinds of grid spatial scale for the last three test groups (SPO/SPC/MPC) to find the effect of grid spatial scale on the particle insertion operation. The effective coverage radius
Parameters of finer grids spatial scale.
As both algorithms use the iteration searching, the final finding position of 200 times insertion may be different between the WI-USHER algorithm and the USHER algorithm. The results of the first test case are demonstrated in two parts: first, we show the validity of our WI-USHER algorithm that using exclusive rules do not damage the correctness. Physical properties such as system temperature T, system pressure P, total energy per particle and excess energy per particle

Temperature for density from 0.8 to 1.0 of the periodic box case.

Pressure for density from 0.8 to 1.0 of the periodic box case.

Total energy per particle for density from 0.8 to 1.0 of the periodic box case.

Excess energy per particle for density from 0.8 to 1.0 of the periodic box case.
Second, we compare the efficiency of these two algorithms using the average force evaluation times as shown in Figure 16, average evaluation error as shown in Figure 17 within 200 times insertion. In each insertion attempt, the kernel part is the evaluation of force that imposed by other particles. Therefore, evaluating the force times depicts the efficiency of the algorithm.
6
At the same time, we also show the evaluation error between the target potential and the finding potential under different input parameters. These two results use the result of the USHER algorithm as the base line. Under different finer grid spatial scales, more finer the grids brings more storage consuming, on the other side, more sparer the girds, the SPC can only cover the local grid and lose the ability of covering neighbor grids. We also show the detailed insertion time info of these two algorithms, that is, the WI-USHER algorithm consists of iteratively searching

Average force evaluation times of 200 times insertion for the WI-USHER algorithm with three exclusive rules, SPO, SPC, and MPC, and the result of the USHER algorithm is as the base line.

Average evaluation error of 200 times insertion for the WI-USHER algorithm with three exclusive rules, SPO, SPC, and MPC, and the result of the USHER algorithm is as the base line.

Detailed insertion time info, total simulation time, and data maintenance time of the WI-USHER algorithm, and the result of the USHER algorithm is as the base line.
From the above results, we find that the average force evaluation times and insertion time of the WI-USHER algorithm almost perform better than the USHER algorithm. The SPO test group performs a little worse for exclusion few grids than the SPC group and the MPC group, even than the USHER. The SPC group and the MPC group exclude more grids and clearly direct to the final target region. The insertion time-consuming by the WI-USHER algorithm is almost less than the USHER algorithm; furthermore, the MPC group brings in more data maintenance time but excludes more grids. Under the five finer grids spatial scale, the h3 group performs the best with three exclusive rules.
Through these four test groups, we draw the following empirically conclusion: according to the definition of healthy distance, in 3D simulation, the body diagonal of finer girds is chosen as about
Channel flow past the rough wall
The second test case is the channel flow past the rough wall17,26 as shown in Figure 19. We use this test case to compare the parallel performance of these two algorithms when existing mass flux across the MD–continuum interface. The simulation domain is

Diagram of channel flow past the rough wall.
We present our WI-USHER algorithm superiority over the USHER algorithm under parallelism imbalance circumstance. The degree of parallelism is 12 and the MD domain is decomposed into 4 × 3, that is, four in the x-direction and three in y-direction, only four processes are inserted processes to accomplish the particle insertion operation. We show the average force evaluation times in Figure 20, the evaluation error in Figure 21, and the detailed time-consuming in Figure 22.

Average force evaluation times for the WI-USHER algorithm with three exclusive rules, SPO, SPC, and MPC, and the result of the USHER algorithm is as the base line.

Average evaluation error for the WI-USHER algorithm with three exclusive rules, SPO, SPC, and MPC, and the result of USHER is as the base line.

Detailed insertion time info, total simulation time, and data maintenance time of the WI-USHER algorithm.
The average force evaluation times of the USHER algorithm is 25,715 and the percentage of the particle insertion processing the total simulation time is 23.5%. The WI-USHER algorithm performs far superior to the USHER algorithm and the percentage decreases to only 3% of the total simulation time. The substantial decrease in the average force evaluation times leads to the alleviation of the imbalance of parallel particle insertion. We list the detailed percentage in Table 7, taking SPC-h3 as an example.
Detailed of parallel time-consuming.
Worst case
The original USHER algorithm uses the local potential landscape to efficiently find the target position and performs superiority over our WI-USHER algorithm when the number density among moderate values. The reason is that the WI-USHER algorithm needs to maintain the relationship between particles and finer grids, and after excluding particle grids, the WI-USHER algorithm traverses all the remained grids to find the target position. When the number density is moderate, there are multi possible positions that meeting the target, the USHER algorithm that totally random chooses initial position performs better. We use Table 8 to describe the worse circumstance of our WI-USHER algorithm. Under the same simulation condition as in section “Periodic box,” we expect the number density to increasing from 0.4 to 0.6 within 200 insertions. The average force evaluation times, evaluation error, and detailed insertion time (
Results of the worst case.
Conclusions
The HAC method based on domain decomposition serves as an important tool for micro-fluid simulation, which obtains both simulation accuracy and efficiency. In HAC coupling method, mass exchanging between the continuum region and the atomistic region is one of the key operations to keep correct data exchanging between the two simulation methods. In order to deal with the mass exchanging, the particle insertion algorithm was proposed to handle the transformation for macroscopic velocity to number of insertion particles. For dense micro-fluid simulation, there exists a certain degree of parallelism load imbalance when directly using the USHER algorithm in the HAC coupling simulation. As we have tested the efficiency of the USHER algorithm, when the density in the inserted region reaches a slight high to high value, the iteration times of the USHER algorithm tend to reach a sufficient high value which brings in the imbalance between different inserted processes. There is an urgent need to improve the efficiency of particle insertion algorithm to reduce its operation time during the coupling simulation to balancing the insertion workload between processes. The optimization efficiency of the particle insertion algorithm will further improve the total simulation efficiency of the HAC method.
In this article, we propose a grid-based parallel particle insertion algorithm, named WI-USHER. The idea of the algorithm is, first, slicing the region to be inserted into finer grids with proper spacing scale, second, marking grids in black according to the occupation between particles and grids, the SPC of neighbor grids and the MPC of neighbor girds and third, using a steepest decent method to find the target insertion point. We implement our HAC coupling framework based on OpenFOAM and LAMMPS as well as the WI-USHER algorithm. We use two test cases to show the superiority of our WI-USHER algorithm over the USHER algorithm. Using three exclusive rules and proper finer grid slicing, the WI-USHER algorithm performs lower averaged force evaluation times, which decreases from
We aim to provide more comprehensive implementation of our WI-USHER algorithm for optimization of parameters to further improve the efficiency and workload balancing.
Footnotes
Academic Editor: Xiaotun Qiu
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors would like to thank the National Key Research and Development Program of China (No. 2016YFB0201301), Science Challenge Project (No. JCKY2016212A502), and the open fund from the State Key Laboratory of High Performance Computing (Grant No. 201303-01, 201503-01, and 201503-02).
