Abstract
The hybrid atomistic–continuum coupling method based on domain decomposition serves as an important tool for the microfluidic simulation. However, major modifications to existing codes are often required to enable such simulations, which pose significant difficulties. In this article, in order to provide an efficient and easy-to-use software framework for field users, we propose a hybrid atomistic–continuum parallel coupling framework, named HACPar, based on open-source software platforms. We abstract the software architecture of the hybrid atomistic–continuum coupling framework based on geometric decomposition for the first time, demonstrate the detailed implementation of the framework, and present deep research on the coupling-oriented parallel issues which may improve the flexibility and efficiency of other multiscale parallel applications. The benchmark cases verify the correctness and efficiency of our HACPar framework. The benchmark results show that the scalability of the hybrid simulations is reached up to 1536 cores.
Keywords
Introduction
The nanoscale fluid phenomena play a quite important role in many applications, such as micro- and nanoscale channel flow and boundary layer flow.1–3 Micro-electromechanical system (MEMS) devices and lab-on-a-chip devices are two typical cases that feature flow at micro- and nanometer scales. In order to investigate these phenomena, researchers usually use the multiscale modeling methods. The advantages of multiscale modeling lie in the ability of revealing properties of multiscale systems by capturing phenomena that appear in a wide range of time and length scales, which are beyond any single solver and method.4,5 Especially, the hybrid atomistic–continuum (HAC) method based on geometric coupling has seen rapid development since it allows on-the-fly information exchange and interaction between multiple simulation regions.
In the previous research, there are few investigations on the coupling simulation framework and coupling-oriented parallel issues. Delgado-Buscalioni 6 proposes a coupling simulation framework for molecular-continuum simulation that deployed on grid-like computer architectures. Tang et al. 7 provide a universal interface for multiscale simulation but based on only particle framework. For the existing molecular dynamics (MD) software, FrantzDale et al. 8 extend the multiscale simulation ability of LAMMPS 9 software. For the mesh-particle configuration of the coupling framework, Neumann et al. 10 propose the MaMico framework that provides interfaces for multiscale simulation and several investigations on parallel issues. Cosden and Lukes 11 design a preliminary framework based on OpenFOAM 12 and LAMMPS 9 but less of deep parallelization design and abstraction.
Although there have been substantial theoretical research, there are still nontrivial challenges when turning the theory into practice. In the previous research, the researcher often uses the house design codes and performs several work on the open-source software frameworks. Nevertheless, the designers tend to regard the open-source software as the black box or the gray box and lack the deep investigation of inside software configuration. Therefore, the coupling computational platform serves less effectively and systematically. Although the open-source platforms such as OpenFOAM and LAMMPS have certain parallel optimization, there is still an urgent need of how to design the optimized parallel issue oriented to coupling methods which has not been in-depth study. Moreover, the field experts have less experience on hand-on modifying the existing frameworks when they want to add new coupling strategies, and also field users need an easy-to-use computational platform with user-friendly pre-processing and post-processing user interfaces. The practicable method is the computer expert providing a high modularization, low development cost, and easy-to-use coupling computational framework for field experts.
Our research group has made a deep study on the OpenFOAM software platform. Especially on the multiscale simulation, we have made several researches.13,14 The development of the HAC software framework has intrinsic challenges, that is, the correctness of theory coupling model, the choice of sampling data storage on the atomistic part, and the effectiveness of data exchanging. In this article, focusing on the design of mesh-particle multiscale coupling framework, we propose an HAC parallel coupling framework, named HACPar, which provides an efficient and easy-to-use software framework for field users. The main contributions of this article are summarized below:
We abstract the software architecture of the HAC coupling framework based on geometric decomposition for guiding further development for the first time. We provide the design of the coupling interfaces, data structure organization, process organization, and the multiscale visualization in detail.
We present deep research on the coupling-oriented parallel issues including decomposition of the communication world, smart data exchanging, coupling-oriented load balancing, and the parallel optimized particle insertion algorithm.
We implement the parallel coupling framework HACPar based on OpenFOAM and LAMMPS and the user-friendly multiscale pre-processing and post-processing tool based on SALOME. 15 We verify the correctness and efficiency of our HACPar framework through benchmark cases.
The remainder of this article is organized as follows: section “Theory and methodology” reviews the HAC simulation methodology, coupling-oriented parallel issues, and the multiscale visualization. Section “Basic design and implementation” presents the basic design and implementation of HACPar based on OpenFOAM and LAMMPS and the pre-processing and the post-processing tool based on SALOME. Section “Advanced coupling-oriented parallel optimization” describes the advanced design and implementation of coupling-oriented parallel optimization. Section “Experiments” verifies the efficiency of our framework using benchmark cases. Finally, section “Conclusion” concludes this article and brings up future research expectation.
Theory and methodology
In this section, we review the HAC coupling method based on domain decomposition, the parallel issues involved in the simulation, and the multiscale visualization tools.
HAC coupling
With the rapid development of high performance computing, the simulation ability of classical molecular dynamic is extended but still cannot fulfill very large-scale particle simulation applications. HAC method limits the use of molecular dynamic simulation to regions where the atomistic scales needed to be resolved, using a continuum-based solver for the remainder of the domain, which obtains both simulation accuracy and efficiency.
For the two-dimensional Couette flow, 16 the domain is split into an atomistic region and a continuum region, using an overlap region to alleviate dramatic density oscillation and couple the results of these two regions. The outer boundary of the continuum region which resides in the atomistic region is called hybrid solution interface (HSI). Later, HAC models emerged differing in forms of coupling strategies, boundary condition extraction, and nonperiodic boundary force models.4,17,18 Generally, the overlap region contains a control region (control region), a buffer region (buffer region), an atomistic-coupling-to-continuum-region (A → C region) and a continuum-coupling-to-atomistic region (C → A region). In the control region, there is a reflecting wall boundary condition to prevent particles leaving the atomistic region freely, and the outer boundary is the MD-continuum interface. The schematic diagram of the domain decomposition–based HAC method is depicted in Figure 1.

Domain decomposition of benchmark case (Couette flow).
One of the coupling tricks is to divide the overlap region into bins. In order to exchange data between computational fluid dynamics (CFD) and MD, we usually choose the volume of a bin equal to the volume of one mesh cell. In order to remedy the pressure lose in the nonperiodic boundary, we apply the nonperiodic boundary force 19 in the control region. We use the parallel WI-USHER algorithm 20 to deal with the mass flux exchanging in the control region, which is later presented as the detailed implementation in our framework. The data exchanging between the continuum region and the atomistic region uses the constraint Lagrangian method and the sampling averaging method.
The solver for the continuum part of the HAC method is the incompressible Navier–Stokes (N-S) equation. The N-S equation and the continuity equation are given by
where u is the bulk flow velocity, P is the pressure,
The truncated and shifted Lennard–Jones (LJ) potential is used to model the interactions between fluid particles as well as wall particles in the atomistic region. The potential is given by
where
Coupling-oriented parallel issue
With the increasing need on the size and accuracy of simulation for the practical problems, this results in the huge amount of the computational power. There is a great urgent to perform massively parallel simulation.
At present, most of the CFD softwares use the parallel algorithms based on mesh decomposition. The mesh cells will be decomposed into multiple parts before the simulation starts and then each process reads one part of the cells and begins the calculating. The limit of parallel scaling is the communication between processes. Take OpenFOAM 12 as an example, OpenFOAM can unlimitedly scale to massively parallel simulation, but for the limit of the current computational method, the maximum parallel size usually does not exceed 1000 cores. 22 The usual parallel optimization methods are improvement of the numerical method, using multi-threads with processes, and optimization of the communication overhead. For the MD software, the optimization methods are dynamic load balancing technique, modification of integration method, and using specific processors. While for the coupling simulation, the researchers have few investigations on these parallel issues.7,23
Multiscale visualization and analysis
At present, there are many commercial and open-source pre- and post-processing software for single-scale simulation but scarcely for multiscale simulation. For CFD, ICEM CFD, 24 PointWise, 25 and GridPro 25 are often used for the generation of the structural and nonstructural mesh. For data visualization, Paraview 26 and GLView 27 are two typical softwares for visualizing and analyzing results. Especially, SALOME software platform generates many pre- and post-processing modules through COBRA technique. For MD, OVITO 28 and VMD 29 are post-processing software while few pre-processing platform. In summary, in the multiscale simulation, there is still no available uniform pre- and post-processing platform for field users to use.
Basic design and implementation
In this section, we present the basic design and implementation of our HACPar framework based on OpenFOAM, LAMMPS, and SALOME. Subsection “Design of HACPar framework” presents the abstraction of the framework at the first time. Subsection “Kernel coupling algorithms” describes the implementation of kernel coupling algorithms. Subsection “HACP3: pre-processing and post-processing tools” presents the uniform pre- and post-processing tool for our HACPar framework.
Design of HACPar framework
The kernel coupling solver of our HACPar framework is implemented based on the open-source software platform OpenFOAM and LAMMPS using C++ language, which OpenFOAM serves as the base of the framework and LAMMPS as a dynamics loadable library. The uniform pre- and post-processing tool, HACP3, is implemented based on SALOME software platform. Focusing on the coupling input parameters, coupling configuration, and post-visualization, we add or redesign the modules and user interfaces in these three platforms. This work provides a user-friendly, efficient, massively parallel simulation platform for field users. In section “Theory and methodology,” we have mentioned the theory model of state-variable coupling and associated coupling procedures. Here, at the software design level, we present the abstraction diagram of our HACPar framework as shown in Figure 2, which is the first time to present the detailed implementation of the coupling framework. The new added or redesigned modules are marked in gray.

Overall design and abstraction of HACPar software framework.
We categorize the newly added and redesigned modules into three parts: pre-processing, kernel solver, and post-processing to illustrate our work. The pre-processing part includes the User Dict, the Couple Dict, and the Coupling-Oriented Load Balance. The coupling solver part includes the Atomistic Data Sampler, the Continuum Data Scatter, the Nonperiodic Boundary Force, the Particle Insertion Interface, and the Parallel Communication Interface. The post-processing part includes the Coupling Data Transformation and the Multiscale Visualization:
User Dict and Couple Dict. In the original single solver, OpenFOAM and LAMMPS use the input text files to initiate the simulation domain, lattice configuration, and solver parameters. In coupling simulation, we redesign the input files consisting of User Dict and Couple Dict to meet the simulation requirement and generate the pre-processing tool in HACP3.
Coupling-Oriented Load Balance. This is the parallel optimization method for the intrinsic load imbalance phenomena in the HAC coupling simulation. In our previous research, we present the quantitative and qualitative research on these phenomena due to the coupling procedures. This module uses the simulation size as the input to adjust more preferable size of sub-mesh and sub-domain.
Atomistic Data Sampler. Data exchanging interface from the atomistic solver to the continuum solver, using different sampling strategies, averaging and sampling data to transmit to the continuum part.
Continuum Data Scatter. Data exchanging interface from the continuum solver to the atomistic solver, using different constraint coefficient, scattering data to the atomistic part.
Nonperiodic Boundary Force. Remedy for the pressure lose in the atomistic region, using different types of boundary force models.
Particle Insertion Interface. Parallel particle insertion algorithm, named WI-USHER, through profiling of the inserted bin, locally inserting particles into the inserted bin.
Parallel Communication Interface. The most important parallel supporting mechanism for the coupling framework, including multi-Comm initialization, inter-Comm interface, and intra-Comm interface.
Coupling Data Transformation. Transforming the microscopic data to the macroscopic data through statistical methods, which are generated in HACP3.
Multiscale Visualization. Relying on the VTK file and SALOME platform to generate unified visualization which is generated in HACP3.
Kernel coupling algorithms
As we have mentioned in section “Theory and methodology,” the main procedures needed to perform HAC coupling simulation are the atomistic to continuum coupling, the continuum to atomistic coupling, nonperiodic boundary force, and particle insertion. In our HACPar framework, each one is built as one module. We present the detailed design of the first three procedures in this subsection, and the last one is presented in subsection “Particle insertion interface.” We present the main workflow of HACPar framework in Algorithm 1.
In mesh-particle coupling framework, in order to transmit data from the atomistic part to the continuum part, the kernel operation is to make sure the sampling volume, the sampling interval, the source, and destination of transmission. In the atomistic part, we must certify the relationship between the particle and the sampling volume, that is, the bin. According to the previous section, in the overlap region, the bin is chosen the same as the mesh cell. We can use the mesh cell properties to handle this relationship as illustrated in Algorithm 2. The algorithm of Atomistic Data Sampler is shown in Algorithm 3. In the HAC method, the atomistic region is usually chosen where the physical properties changing dramatically or near the boundary layer. In order to facilitate the data exchanging in the overlap region, we choose to build the structure mesh.
In the continuum to atomistic data transmission, we use the constraint Lagrangian dynamics method 16 to transform the data of the cell to particles resided in it. The detailed implementation is presented in Algorithm 4.
HACP3: pre-processing and post-processing tools
Due to the absence of uniform pre-processing and post-processing tools for the multiscale simulation, we design our HACP3 tools that generated in the SALOME platform using CORBA technique. The HACP3 tools can facilitate field users defining coupling configuration and analyzing simulation results.
User defines couple dict
In the original launch of OpenFOAM and LAMMPS simulation, users should define the simulation parameters and control parameters to start one case. In order to perform HAC simulation, using the OpenFOAM launching mechanism as the base, HACP3 tools interact with field users to define coupling simulation parameters and control parameters and generate the coupling instruction file hacCoupleProperties. We add a new module COUPLEEXPORT to SALOME based on the original GEOM and SMESH module, and the detailed implementation can be referenced in Wang et al. 30 We present parts of the associated coupling parameters and control parameters in Table 1 to guide the HAC coupling simulation.
Input parameters for HACPar simulation.
MD: molecular dynamics; CFD: computational fluid dynamics; HACPar: hybrid atomistic–continuum parallel coupling framework.
Multiscale visualization
There are two kinds of simulation methods involved in one coupling simulation. Each of these will output different data information and files. Efficiently resolving these files will help experts analyzing the simulation results and advancing research. The simulation data on the atomistic part are particle velocities, particle positions, and so on while on the continuum part are velocities of cells. This function is implemented by adding a new module DATASTDL 31 to the SALOME platform as part of our HACP3 tools.
We transform the microscopic data from the atomistic solver to macroscopic data using computer-aided manufacturing (CAM) 32 averaging method. Each process produces time frames of simulation results for the continuum solver and the atomistic solver. We should transform these data using Coupling Data Transformation by the aid of VTK file format. The uniform files that contain both macroscopic data and transformed macroscopic data are shown in the PARAVIEW module of the SALOME platform to help field users for further research.
Advanced coupling-oriented parallel optimization
In section “Basic design and implementation,” we design the basic framework of our HACPar but less of parallel consideration. Even though the HAC method greatly improves the simulation efficiency compared to the full molecular simulation, coupling-oriented issues are still should be carefully considered. In this section, we present the deep parallel optimization of our HACPar coupling framework including the organization of parallel processes, highly efficient particle insertion algorithm, and coupling-oriented load balancing.
Parallel communication interface
The parallel organization of HACPar is based on the original parallel mechanism of OpenFOAM, using MPI to generate communication world and send and receive message. In OpenFOAM, there is only one communication world, intra-Comm, that is, MPI_COMM_WORLD. In order to support coupling simulation, we should define the communication world for OpenFOAM, LAMMPS, and associated coupling procedures. For the physical statistic sake, we should initiate the atomistic region more than once to reduce the sampling noise; therefore, we bring in multi-copy atomistic regions.
In the HAC coupling simulation, the atomistic region and the continuum region are decomposed to each process through certain decomposition method. For the continuum part, let M be the original mesh, and each process will be allocated a sub-mesh
Let

Smart sending mechanism between CFD and MD communicators including m MD communicators and 1 CFD communicator. Copy 0 serves as the master copy.
We define the copy 0 of the atomistic region as the master copy, and others are slave copies. The sending and receiving data between these two solvers are based on smart sending mechanism. The key idea of smart sending mechanism is initiated by the source and destination of each data message at the beginning of the coupling simulation. Each bin in the overlap region has its CFD owner and its MD owner. Each process will tally the owners of each local bin and generate the sending and receiving router between two solvers. The CFD process and the MD process that share overlap bins are defined as related.
We should also pay attention on the mapping schemes between processes and computational nodes. Due to the major computational time resided in the MD part, the HACPar launches many processes which consist of one CFD communicator and several MD communicators. These processes should be distributed to multiple computing nodes for parallel processing. There are two different types of communications: intergroup and intragroup communications; additionally, the overheads are variant between the communications of different computing nodes and within one node. Basically, communication between two computing nodes needs to send and receive messages through the network interface, yet the intra-node communication could be completed by memory copy; therefore, the intra-node communication should be faster than inter-node communications. In principle, we should map the processes with more communications to the same computing node to reduce the overhead.
Particle insertion interface
In order to cope with the mass exchanging between the continuum region and the atomistic region, we need the particle insertion algorithm to handle this issue. In the previous research, Delgado-Buscalioni and Coveney 33 propose a sequential algorithm USHER to handle particle insertion into dense fluid. Neumann and Tchipev 34 implement a parallel version on the link-cell organization. In our previous research, 20 we find that the efficiency of the particle insertion algorithm seriously impacts the efficiency of the whole coupling simulation. Therefore, we design a parallel version of particle insertion algorithm to remarkably improve the performance in the multiscale simulation, named WI-USHER. More details of the theoretical idea can be found in our previous work. 20
In this section, we present the detailed design of our WI-USHER algorithm for the Particle Insertion Interface as depicted in Algorithm 6. The implementation procedure of the WI-USHER algorithm is to profile the particle message resided in the inserted region with finer grids, use the exclusive rules to eliminate those black grids, and combine the local energy potential to find the target position in the left white grids.
Coupling-oriented load balancing
There are crucial difference for load balancing between the HAC multiscale method and the original single-scale method due to the existence of the overlap region and the coupling procedures. Due to the major simulation time focus on the atomistic solver, the default parallel decomposition method of the atomistic region cannot meet the need of coupling-oriented load balancing. In our previous research, 35 the particles in the overlap region are named as coupling particle, while other particles are named as normal particle. The Coupling-Oriented Load Balance module receives the user configuration of the simulation domain, calculates the workload of the particle simulation and uses the guiding model to decide the proper decomposition style of the particle region. The decomposition of the atomistic region and the overlap region, that is, the particle region, are defined in the following three styles, one direction style, default style, and adjust style. The sub-domain of each process in adjust style can be unequal (Figures 4 and 5).

Detailed coupling procedures operated on normal particles (yellow dots) and coupling particles (yellow dots with orange circle). Normal particles reside in the atomistic region, while coupling particles in the overlap region.

Three kinds of decomposition style for the particle region, taking degree of parallelism 12 as an example. The one direction style is decomposed in the y direction as an example, and the adjust style could have unequal sub-domain.
This load balancing technique takes the coupling procedures into account and properly deals with coupling-oriented decomposition. By taking the proper decomposition of the particle region into account, we can further improve the simulation efficiency of the HAC simulation.
Experiments
In this section, we verify our HACPar framework through benchmark cases and also illustrate scaling and efficiency of our HACPar framework.
Platforms
In the experiments, we use an HPC cluster situated in the State Key Laboratory of High Performance Computing. 36 This computing platform consists of hundreds of computing nodes, and each node contains 12 Intel Xeon 2.1-GHz E5-2620 central processing unit (CPU) cores and a total memory of 16 GB.
Validation
In this section, we validate our framework using benchmark cases by comparing simulation results with the analytical solution, that is, the typical no-slip sudden start Couette flow proposed in O’Connell and Thompson. 16
The analytical solution for a sudden start Couette flow is given by equation (4)
where
We initiate the input simulation domain and associating parameters using our pre- and post-processing tool HACP3 as used by Troian and Thompson.
21
The height of the channel is

Snapshot of the domain decomposition configuration of the sudden start Couette case. The upper part is the continuum region, while the lower part is the atomistic region. The overlap region is not shown here.

Snapshot of the coupling parameter configuration in HACP3, taking parts of configuration process as an example.
Initially, the mean fluid is zero in the whole simulation domain. At

Velocity evolution profiles averaged over five time intervals compared with the analytical solution.

Snapshot of the unified post-visualization of the sudden start Couette case, using the transformed unified VTK files.
Efficiency
We test the efficiency of our HACPar framework from two aspects, that is, particle insertion efficiency and coupling-oriented load balancing. We take the channel flow past the rough wall
38
with mass flux across the MD-continuum interface as depicted in Figure 10 into account. The simulation domain is

Diagram of channel flow past the rough wall.
The WI-USHER algorithm performs far superior to the USHER algorithm, and the percentage decreases to only 3.2% of the total simulation time. The substantial decrease in the average force evaluation times leads to the alleviation of the imbalance of parallel particle insertion. We list the detailed percentage in Table 2.
Details of parallel time-consuming for each component.
Next, we test the parallel decomposition method derived from our Coupling-Oriented Load Balance module against the default decomposition method. The degree of parallelism is 12. Under above configuration of the simulation domain, the default decomposition style of the particle domain is
Different decomposition styles for the particle region.
The desired decomposition style we obtained is the one direction with
Details of communication and coupling time-consuming ratio.
Scalability
In order to test the scalability of our HACPar framework, we consider a 3D benchmark case as depicted in Figure 11 with two different setups as listed in Table 5 to evaluate the parallel performance of our HACPar framework. The geometry is a 3D channel surrounded by a solid planar wall. The top and bottom walls are constructed in the z–x plane, and the side walls are in the y–x plane. Periodic boundary condition is applied on the inlet and outlet of the channel in the x direction, while directions y and z are wall. The central part of the simulation domain is calculated by the atomistic solver and the other part is by continuum solver.

Schematic diagram of 3D simulation geometry and the atomistic region located in the central part of the simulation domain.
Description of two different setups.
MD: molecular dynamics; CFD: computational fluid dynamics.
We present strong scaling experiments for these two setups on our experiment platform. The speedups are shown in Figures 12 and 13. The configuration of our platform is 12 cores on one node. During these two tests, we perform the degree of parallelism from 1 to 1536. We can figure out that due to the bring-in coupling overhead achieves good scaling, our HACPar platform achieves quite scaling results and retains the ability of stronger scalability for further parallel optimization.

Strong scaling for the small setup flow with degree of parallelism from 1 to 384. The solid line is the ideal speedup, while the red dot line is the speedup of HACPar.

Strong scaling for the big setup flow with degree of parallelism from 1 to 1536. The solid line is the ideal speedup, while the red dot line is the speedup of HACPar.
Conclusion
In this article, we propose an HAC parallel coupling framework, HACPar, based on OpenFOAM, LAMMPS, and SALOME. In order to provide an effective and easy-to-use software framework for field users, we first abstract the software diagram of the HAC coupling framework based on geometric decomposition and present implementation of coupling interfaces, data structures, and so on. Due to the limited research of coupling-oriented parallel issue, we present deep research on the coupling-oriented parallel issues including decomposition of the communication world, smart data exchanging, coupling-oriented load balancing, and the parallel optimized particle insertion algorithm. The benchmark cases verify the correctness and efficiency of our HACPar framework. The overhead of associating coupling procedures is limited to generally 13%. The strong scaling test also shows the potential of our HACPar to perform larger parallelism simulation.
The optimization techniques of coupling-oriented parallel issues should be further investigated. The Coupling-Oriented Load Balance module is currently a preliminary model and needed to be further designed. Other optimization techniques such as communication hiding could be effective to further improve the scalability of HACPar. We also aim to test a revised version of our HACPar framework for thousands of cores in the future.
Footnotes
Academic Editor: Xiaotun Qiu
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Key Research and Development Program of China (no. 2016YFB0201301), Science Challenge Project (no. JCKY2016212A502), and the open fund from the State Key Laboratory of High Performance Computing (grant nos 201303-01, 201503-01, and 201503-02).
