ScSF
ScSF: Scheduling Simulation Framework
Workflow-aware scheduling research required tools to test scheduling algorithms and simulate real HPC systems. We created ScSF, an open-source Scheduler Simulation Framework that covers all the steps of scheduling research through simulation. ScSF provides capabilities for workload modeling, workload generation, system simulation, comparative workload analysis, and experiment orchestration. The simulator is designed to be run over a distributed computing infrastructure facilitating large-scale tests.
We demonstrate ScSF through a case study to develop new techniques to manage scientific workflows in a batch scheduler. The evaluation consisted of 1728 experiments and equivalent to 33 years of simulated time, were run in a deployment of ScSF over a distributed infrastructure of 17 compute nodes over two months. Finally, the experimental results were analyzed using the ScSF framework to demonstrate that our technique minimizes workflow turnaround time without over-allocating resources. This work also resulted into a list of lessons learned from our experiences to inform future large-scale simulation studies using ScSF and other similar frameworks.
ScSF is an outcome of the collaboration to investigate new scheduling algorithms for scientific workflows in HPC between researchers of the Data Science and Technology department at the Lawrence Berkeley National and the Distributed Systems group at Umeå University.