Infrastructure Strategy to Support Collaborations @ LBNL

WoAS

WoAS - Workflow-Aware Scheduling

Scientific workflows are increasingly common in the workloads of current High Performance Computing (HPC) systems. However, HPC schedulers do not incorporate workflow-specific mechanisms beyond the capacity to declare dependencies between their jobs. Thus, workflows are run as sets of batch jobs with dependencies, which induces long intermediate wait times and, consequently, long workflow turnaround times. Alternatively, to reduce their turnaround time, workflows may be submitted as single pilot jobs that are allocated their maximum required resources for their entire runtime. Pilot jobs achieve shorter turnaround times but reduce the HPC system's utilization because resources may idle during the workflow's execution.

We present a workflow-aware scheduling (WoAS) system that enables existing scheduling algorithms to exploit fine-grained information on a workflow's resource requirements and structure without modification. The current implementation of WoAS is integrated into Slurm, a widely used HPC batch scheduler. We evaluate the system using a simulator using real and synthetic workflows and a synthetic baseline workload that captures job patterns observed over three years of workload data from Edison, a large supercomputer hosted at the National Energy Research Scientific Computing Center. Our results show that WoAS reduces workflow turnaround times and improves system utilization without significantly slowing

down conventional jobs.

WoAS is an outcome of the collaboration to investigate new scheduling algorithms for scientific workflows in HPC between researchers of the Data Science and Technology department at the Lawrence Berkeley National and the Distributed Systems group at Umeå University.

Page updated

Report abuse