Abstract
Scientific computing centers or private (in-house) cloud data centers do not rely on the standard pay-as-you-go business model common in commercial clouds to allocate resources. Instead, the system is typically shared by a set of selected users, and the administrator’s job is to ensure that resources are shared fairly, given the existing policies of that organization. One common approach, especially in batch systems, is to deploy a fairshare-based job prioritization in the scheduler, where a prioritization mechanism balances resource consumption so that individual users get the right shares of resources over time. In this work, we present a simulator that mimics the settings of the fair-sharing algorithm in a batch system. Using a set of experiments, we demonstrate the utility of this tool in tuning fairshare settings in a standard HPC/HTC scheduler and present the impact of (often overlooked) additional options for modifying the basic fairshare settings. Furthermore, we introduce the batch system simulator AleaNG, which allows for complex studies of the impacts of various fair-sharing and scheduling policies on the performance of the system. Last but not least, we compare the outputs of both simulators with a real Open PBS resource manager and show that they simulate fair-sharing and job execution accurately. All the findings in this paper are based on our real-world experience of running and optimizing a distributed national computing infrastructure in the Czech Republic.
Get full access to this article
View all access options for this article.
