Abstract
One of the most important aspects of supercomputer development in the post-Moore era is the interconnect technologies that allow one to unite a multitude of processing elements into a well-synchronized computing system. Novel types of supercomputer interconnect require careful benchmarking and compliance with the requirements of modern hardware trends. GPU-based heterogeneous computing is one of the most important current avenues for building high performance computing systems, and the support of GPU-aware MPI technology is a requirement for any competitive interconnect. In this paper, we describe a UCX API based GPU-aware MPI implementation for the Angara interconnect. Performance analysis for peer-to-peer, MPI_Bcast and MPI_Reduce operations is presented, as well as for the rocHPL benchmark and for a typical biomolecular model within the LAMMPS molecular dynamics code. The deployment of the Desmos supercomputer equipped with both Angara and InfiniBand FDR allows us to make an accurate comparison of these two types of interconnect using the latter as a reference.
Get full access to this article
View all access options for this article.
