Abstract
A Resource and Job Management System (RJMS) is a crucial system software part of the HPC stack. It is responsible for efficiently delivering computing power to applications in supercomputing environments. Its main intelligence relies on resource selection techniques to find the most adapted resources to schedule the users’ jobs. This article introduces a new method that takes into account the topology of the machine and the application characteristics to determine the best choice among the available nodes of the platform, based upon the network topology and taking into account the application communication pattern. To validate our approach, we integrate this algorithm as a plugin for Simple Linux Utility for Resource Management (SLURM), a well-known and widespread RJMS. We assess our plugin with different optimization schemes by comparing with the default topology-aware S
Get full access to this article
View all access options for this article.
