Abstract
The hybrid atomistic–continuum coupling method based on domain decomposition serves as an important tool for the microfluidic simulation. However, major modifications to existing codes are often required to enable such simulations, which pose significant difficulties. In this article, in order to provide an efficient and easy-to-use software framework for field users, we propose a hybrid atomistic–continuum parallel coupling framework, named HACPar, based on open-source software platforms. We abstract the software architecture of the hybrid atomistic–continuum coupling framework based on geometric decomposition for the first time, demonstrate the detailed implementation of the framework, and present deep research on the coupling-oriented parallel issues which may improve the flexibility and efficiency of other multiscale parallel applications. The benchmark cases verify the correctness and efficiency of our HACPar framework. The benchmark results show that the scalability of the hybrid simulations is reached up to 1536 cores.
Keywords
Introduction
The nanoscale fluid phenomena play a quite important role in many applications, such as micro- and nanoscale channel flow and boundary layer flow.1–3 Micro-electromechanical system (MEMS) devices and lab-on-a-chip devices are two typical cases that feature flow at micro- and nanometer scales. In order to investigate these phenomena, researchers usually use the multiscale modeling methods. The advantages of multiscale modeling lie in the ability of revealing properties of multiscale systems by capturing phenomena that appear in a wide range of time and length scales, which are beyond any single solver and method.4,5 Especially, the hybrid atomistic–continuum (HAC) method based on geometric coupling has seen rapid development since it allows on-the-fly information exchange and interaction between multiple simulation regions.
In the previous research, there are few investigations on the coupling simulation framework and coupling-oriented parallel issues. Delgado-Buscalioni 6 proposes a coupling simulation framework for molecular-continuum simulation that deployed on grid-like computer architectures. Tang et al. 7 provide a universal interface for multiscale simulation but based on only particle framework. For the existing molecular dynamics (MD) software, FrantzDale et al. 8 extend the multiscale simulation ability of LAMMPS 9 software. For the mesh-particle configuration of the coupling framework, Neumann et al. 10 propose the MaMico framework that provides interfaces for multiscale simulation and several investigations on parallel issues. Cosden and Lukes 11 design a preliminary framework based on OpenFOAM 12 and LAMMPS 9 but less of deep parallelization design and abstraction.
Although there have been substantial theoretical research, there are still nontrivial challenges when turning the theory into practice. In the previous research, the researcher often uses the house design codes and performs several work on the open-source software frameworks. Nevertheless, the designers tend to regard the open-source software as the black box or the gray box and lack the deep investigation of inside software configuration. Therefore, the coupling computational platform serves less effectively and systematically. Although the open-source platforms such as OpenFOAM and LAMMPS have certain parallel optimization, there is still an urgent need of how to design the optimized parallel issue oriented to coupling methods which has not been in-depth study. Moreover, the field experts have less experience on hand-on modifying the existing frameworks when they want to add new coupling strategies, and also field users need an easy-to-use computational platform with user-friendly pre-processing and post-processing user interfaces. The practicable method is the computer expert providing a high modularization, low development cost, and easy-to-use coupling computational framework for field experts.
Our research group has made a deep study on the OpenFOAM software platform. Especially on the multiscale simulation, we have made several researches.13,14 The development of the HAC software framework has intrinsic challenges, that is, the correctness of theory coupling model, the choice of sampling data storage on the atomistic part, and the effectiveness of data exchanging. In this article, focusing on the design of mesh-particle multiscale coupling framework, we propose an HAC parallel coupling framework, named HACPar, which provides an efficient and easy-to-use software framework for field users. The main contributions of this article are summarized below:
We abstract the software architecture of the HAC coupling framework based on geometric decomposition for guiding further development for the first time. We provide the design of the coupling interfaces, data structure organization, process organization, and the multiscale visualization in detail.
We present deep research on the coupling-oriented parallel issues including decomposition of the communication world, smart data exchanging, coupling-oriented load balancing, and the parallel optimized particle insertion algorithm.
We implement the parallel coupling framework HACPar based on OpenFOAM and LAMMPS and the user-friendly multiscale pre-processing and post-processing tool based on SALOME. 15 We verify the correctness and efficiency of our HACPar framework through benchmark cases.
The remainder of this article is organized as follows: section “Theory and methodology” reviews the HAC simulation methodology, coupling-oriented parallel issues, and the multiscale visualization. Section “Basic design and implementation” presents the basic design and implementation of HACPar based on OpenFOAM and LAMMPS and the pre-processing and the post-processing tool based on SALOME. Section “Advanced coupling-oriented parallel optimization” describes the advanced design and implementation of coupling-oriented parallel optimization. Section “Experiments” verifies the efficiency of our framework using benchmark cases. Finally, section “Conclusion” concludes this article and brings up future research expectation.
Theory and methodology
In this section, we review the HAC coupling method based on domain decomposition, the parallel issues involved in the simulation, and the multiscale visualization tools.
HAC coupling
With the rapid development of high performance computing, the simulation ability of classical molecular dynamic is extended but still cannot fulfill very large-scale particle simulation applications. HAC method limits the use of molecular dynamic simulation to regions where the atomistic scales needed to be resolved, using a continuum-based solver for the remainder of the domain, which obtains both simulation accuracy and efficiency.
For the two-dimensional Couette flow,
16
the domain is split into an atomistic region and a continuum region, using an overlap region to alleviate dramatic density oscillation and couple the results of these two regions. The outer boundary of the continuum region which resides in the atomistic region is called hybrid solution interface (HSI). Later, HAC models emerged differing in forms of coupling strategies, boundary condition extraction, and nonperiodic boundary force models.4,17,18 Generally, the overlap region contains a control region (

Domain decomposition of benchmark case (Couette flow).
One of the coupling tricks is to divide the overlap region into
The solver for the continuum part of the HAC method is the incompressible Navier–Stokes (N-S) equation. The N-S equation and the continuity equation are given by
where
The truncated and shifted Lennard–Jones (LJ) potential is used to model the interactions between fluid particles as well as wall particles in the atomistic region. The potential is given by
where
Coupling-oriented parallel issue
With the increasing need on the size and accuracy of simulation for the practical problems, this results in the huge amount of the computational power. There is a great urgent to perform massively parallel simulation.
At present, most of the CFD softwares use the parallel algorithms based on mesh decomposition. The mesh cells will be decomposed into multiple parts before the simulation starts and then each process reads one part of the cells and begins the calculating. The limit of parallel scaling is the communication between processes. Take OpenFOAM 12 as an example, OpenFOAM can unlimitedly scale to massively parallel simulation, but for the limit of the current computational method, the maximum parallel size usually does not exceed 1000 cores. 22 The usual parallel optimization methods are improvement of the numerical method, using multi-threads with processes, and optimization of the communication overhead. For the MD software, the optimization methods are dynamic load balancing technique, modification of integration method, and using specific processors. While for the coupling simulation, the researchers have few investigations on these parallel issues.7,23
Multiscale visualization and analysis
At present, there are many commercial and open-source pre- and post-processing software for single-scale simulation but scarcely for multiscale simulation. For CFD, ICEM CFD, 24 PointWise, 25 and GridPro 25 are often used for the generation of the structural and nonstructural mesh. For data visualization, Paraview 26 and GLView 27 are two typical softwares for visualizing and analyzing results. Especially, SALOME software platform generates many pre- and post-processing modules through COBRA technique. For MD, OVITO 28 and VMD 29 are post-processing software while few pre-processing platform. In summary, in the multiscale simulation, there is still no available uniform pre- and post-processing platform for field users to use.
Basic design and implementation
In this section, we present the basic design and implementation of our HACPar framework based on OpenFOAM, LAMMPS, and SALOME. Subsection “Design of HACPar framework” presents the abstraction of the framework at the first time. Subsection “Kernel coupling algorithms” describes the implementation of kernel coupling algorithms. Subsection “HACP3: pre-processing and post-processing tools” presents the uniform pre- and post-processing tool for our HACPar framework.
Design of HACPar framework
The kernel coupling solver of our HACPar framework is implemented based on the open-source software platform OpenFOAM and LAMMPS using C++ language, which OpenFOAM serves as the base of the framework and LAMMPS as a dynamics loadable library. The uniform pre- and post-processing tool, HACP3, is implemented based on SALOME software platform. Focusing on the coupling input parameters, coupling configuration, and post-visualization, we add or redesign the modules and user interfaces in these three platforms. This work provides a user-friendly, efficient, massively parallel simulation platform for field users. In section “Theory and methodology,” we have mentioned the theory model of state-variable coupling and associated coupling procedures. Here, at the software design level, we present the abstraction diagram of our HACPar framework as shown in Figure 2, which is the first time to present the detailed implementation of the coupling framework. The new added or redesigned modules are marked in gray.

Overall design and abstraction of HACPar software framework.
We categorize the newly added and redesigned modules into three parts: pre-processing, kernel solver, and post-processing to illustrate our work. The pre-processing part includes the
Kernel coupling algorithms
As we have mentioned in section “Theory and methodology,” the main procedures needed to perform HAC coupling simulation are the atomistic to continuum coupling, the continuum to atomistic coupling, nonperiodic boundary force, and particle insertion. In our HACPar framework, each one is built as one module. We present the detailed design of the first three procedures in this subsection, and the last one is presented in subsection “Particle insertion interface.” We present the main workflow of HACPar framework in Algorithm 1.
In mesh-particle coupling framework, in order to transmit data from the atomistic part to the continuum part, the kernel operation is to make sure the sampling volume, the sampling interval, the source, and destination of transmission. In the atomistic part, we must certify the relationship between the particle and the sampling volume, that is, the
In the continuum to atomistic data transmission, we use the constraint Lagrangian dynamics method 16 to transform the data of the cell to particles resided in it. The detailed implementation is presented in Algorithm 4.
HACP3: pre-processing and post-processing tools
Due to the absence of uniform pre-processing and post-processing tools for the multiscale simulation, we design our HACP3 tools that generated in the SALOME platform using CORBA technique. The HACP3 tools can facilitate field users defining coupling configuration and analyzing simulation results.
User defines couple dict
In the original launch of OpenFOAM and LAMMPS simulation, users should define the simulation parameters and control parameters to start one case. In order to perform HAC simulation, using the OpenFOAM launching mechanism as the base, HACP3 tools interact with field users to define coupling simulation parameters and control parameters and generate the coupling instruction file
Input parameters for HACPar simulation.
MD: molecular dynamics; CFD: computational fluid dynamics; HACPar: hybrid atomistic–continuum parallel coupling framework.
Multiscale visualization
There are two kinds of simulation methods involved in one coupling simulation. Each of these will output different data information and files. Efficiently resolving these files will help experts analyzing the simulation results and advancing research. The simulation data on the atomistic part are particle velocities, particle positions, and so on while on the continuum part are velocities of cells. This function is implemented by adding a new module DATASTDL 31 to the SALOME platform as part of our HACP3 tools.
We transform the microscopic data from the atomistic solver to macroscopic data using computer-aided manufacturing (CAM)
32
averaging method. Each process produces time frames of simulation results for the continuum solver and the atomistic solver. We should transform these data using
Advanced coupling-oriented parallel optimization
In section “Basic design and implementation,” we design the basic framework of our HACPar but less of parallel consideration. Even though the HAC method greatly improves the simulation efficiency compared to the full molecular simulation, coupling-oriented issues are still should be carefully considered. In this section, we present the deep parallel optimization of our HACPar coupling framework including the organization of parallel processes, highly efficient particle insertion algorithm, and coupling-oriented load balancing.
Parallel communication interface
The parallel organization of HACPar is based on the original parallel mechanism of OpenFOAM, using MPI to generate communication world and send and receive message. In OpenFOAM, there is only one communication world,
In the HAC coupling simulation, the atomistic region and the continuum region are decomposed to each process through certain decomposition method. For the continuum part, let
Let

Smart sending mechanism between CFD and MD communicators including
We define the copy 0 of the atomistic region as the
We should also pay attention on the mapping schemes between processes and computational nodes. Due to the major computational time resided in the MD part, the HACPar launches many processes which consist of one CFD communicator and several MD communicators. These processes should be distributed to multiple computing nodes for parallel processing. There are two different types of communications: intergroup and intragroup communications; additionally, the overheads are variant between the communications of different computing nodes and within one node. Basically, communication between two computing nodes needs to send and receive messages through the network interface, yet the intra-node communication could be completed by memory copy; therefore, the intra-node communication should be faster than inter-node communications. In principle, we should map the processes with more communications to the same computing node to reduce the overhead.
Particle insertion interface
In order to cope with the mass exchanging between the continuum region and the atomistic region, we need the particle insertion algorithm to handle this issue. In the previous research, Delgado-Buscalioni and Coveney 33 propose a sequential algorithm USHER to handle particle insertion into dense fluid. Neumann and Tchipev 34 implement a parallel version on the link-cell organization. In our previous research, 20 we find that the efficiency of the particle insertion algorithm seriously impacts the efficiency of the whole coupling simulation. Therefore, we design a parallel version of particle insertion algorithm to remarkably improve the performance in the multiscale simulation, named WI-USHER. More details of the theoretical idea can be found in our previous work. 20
In this section, we present the detailed design of our WI-USHER algorithm for the
Coupling-oriented load balancing
There are crucial difference for load balancing between the HAC multiscale method and the original single-scale method due to the existence of the overlap region and the coupling procedures. Due to the major simulation time focus on the atomistic solver, the default parallel decomposition method of the atomistic region cannot meet the need of coupling-oriented load balancing. In our previous research,
35
the particles in the overlap region are named as

Detailed coupling procedures operated on

Three kinds of decomposition style for the particle region, taking degree of parallelism 12 as an example. The
This load balancing technique takes the coupling procedures into account and properly deals with coupling-oriented decomposition. By taking the proper decomposition of the particle region into account, we can further improve the simulation efficiency of the HAC simulation.
Experiments
In this section, we verify our HACPar framework through benchmark cases and also illustrate scaling and efficiency of our HACPar framework.
Platforms
In the experiments, we use an HPC cluster situated in the State Key Laboratory of High Performance Computing. 36 This computing platform consists of hundreds of computing nodes, and each node contains 12 Intel Xeon 2.1-GHz E5-2620 central processing unit (CPU) cores and a total memory of 16 GB.
Validation
In this section, we validate our framework using benchmark cases by comparing simulation results with the analytical solution, that is, the typical no-slip sudden start Couette flow proposed in O’Connell and Thompson. 16
The analytical solution for a sudden start Couette flow is given by equation (4)
where
We initiate the input simulation domain and associating parameters using our pre- and post-processing tool HACP3 as used by Troian and Thompson.
21
The height of the channel is

Snapshot of the domain decomposition configuration of the sudden start Couette case. The upper part is the continuum region, while the lower part is the atomistic region. The overlap region is not shown here.

Snapshot of the coupling parameter configuration in HACP3, taking parts of configuration process as an example.
Initially, the mean fluid is zero in the whole simulation domain. At

Velocity evolution profiles averaged over five time intervals compared with the analytical solution.

Snapshot of the unified post-visualization of the sudden start Couette case, using the transformed unified VTK files.
Efficiency
We test the efficiency of our HACPar framework from two aspects, that is, particle insertion efficiency and coupling-oriented load balancing. We take the channel flow past the rough wall
38
with mass flux across the MD-continuum interface as depicted in Figure 10 into account. The simulation domain is

Diagram of channel flow past the rough wall.
The WI-USHER algorithm performs far superior to the USHER algorithm, and the percentage decreases to only 3.2% of the total simulation time. The substantial decrease in the average force evaluation times leads to the alleviation of the imbalance of parallel particle insertion. We list the detailed percentage in Table 2.
Details of parallel time-consuming for each component.
Next, we test the parallel decomposition method derived from our
Different decomposition styles for the particle region.
The desired decomposition style we obtained is the
Details of communication and coupling time-consuming ratio.
Scalability
In order to test the scalability of our HACPar framework, we consider a 3D benchmark case as depicted in Figure 11 with two different setups as listed in Table 5 to evaluate the parallel performance of our HACPar framework. The geometry is a 3D channel surrounded by a solid planar wall. The top and bottom walls are constructed in the

Schematic diagram of 3D simulation geometry and the atomistic region located in the central part of the simulation domain.
Description of two different setups.
MD: molecular dynamics; CFD: computational fluid dynamics.
We present strong scaling experiments for these two setups on our experiment platform. The speedups are shown in Figures 12 and 13. The configuration of our platform is 12 cores on one node. During these two tests, we perform the degree of parallelism from 1 to 1536. We can figure out that due to the bring-in coupling overhead achieves good scaling, our HACPar platform achieves quite scaling results and retains the ability of stronger scalability for further parallel optimization.

Strong scaling for the

Strong scaling for the
Conclusion
In this article, we propose an HAC parallel coupling framework, HACPar, based on OpenFOAM, LAMMPS, and SALOME. In order to provide an effective and easy-to-use software framework for field users, we first abstract the software diagram of the HAC coupling framework based on geometric decomposition and present implementation of coupling interfaces, data structures, and so on. Due to the limited research of coupling-oriented parallel issue, we present deep research on the coupling-oriented parallel issues including decomposition of the communication world, smart data exchanging, coupling-oriented load balancing, and the parallel optimized particle insertion algorithm. The benchmark cases verify the correctness and efficiency of our HACPar framework. The overhead of associating coupling procedures is limited to generally 13%. The strong scaling test also shows the potential of our HACPar to perform larger parallelism simulation.
The optimization techniques of coupling-oriented parallel issues should be further investigated. The
Footnotes
Academic Editor: Xiaotun Qiu
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Key Research and Development Program of China (no. 2016YFB0201301), Science Challenge Project (no. JCKY2016212A502), and the open fund from the State Key Laboratory of High Performance Computing (grant nos 201303-01, 201503-01, and 201503-02).
