Abstract
We present the implementation and performance of a class of directionally unsplit Riemann-solver-based hydrodynamic schemes on graphics processing units (GPUs). These schemes, including the MUSCL-Hancock method, a variant of the MUSCL-Hancock method, and the corner-transport-upwind method, are embedded into the adaptive-mesh-refinement (AMR) code GAMER. Furthermore, a hybrid MPI/OpenMP model is investigated, which enables the full exploitation of the computing power in a heterogeneous CPU/GPU cluster and significantly improves the overall performance. Performance benchmarks are conducted on the Dirac GPU cluster at NERSC/LBNL using up to 32 Tesla C2050 GPUs. A single GPU achieves speed-ups of 101 (25) and 84 (22) for uniform-mesh and AMR simulations, respectively, as compared with the performance using one (four) CPU core(s), and the excellent performance persists in multi-GPU tests. In addition, we make a direct comparison between GAMER and the widely adopted CPU code Athena in adiabatic hydrodynamic tests and demonstrate that, with the same accuracy, GAMER is able to achieve two orders of magnitude performance speed-up.
Keywords
Get full access to this article
View all access options for this article.
