Ponder: Online Prediction of Task Memory Requirements for Scientific Workflows
Fabian Lehmann,Jonathan Bader,Ninon De Mecquenem,Xing Wang,Vasilis Bountris,Florian Friederici,Ulf Leser,Lauritz Thamsen
DOI: https://doi.org/10.1109/e-Science62913.2024.10678682
2024-10-10
Abstract:Scientific workflows are used to analyze large amounts of data. These workflows comprise numerous tasks, many of which are executed repeatedly, running the same custom program on different inputs. Users specify resource allocations for each task, which must be sufficient for all inputs to prevent task failures. As a result, task memory allocations tend to be overly conservative, wasting precious cluster resources, limiting overall parallelism, and increasing workflow makespan.
In this paper, we first benchmark a state-of-the-art method on four real-life workflows from the nf-core workflow repository. This analysis reveals that certain assumptions underlying current prediction methods, which typically were evaluated only on simulated workflows, cannot generally be confirmed for real workflows and executions. We then present Ponder, a new online task-sizing strategy that considers and chooses between different methods to cater to different memory demand patterns. We implemented Ponder for Nextflow and made the code publicly available. In an experimental evaluation that also considers the impact of memory predictions on scheduling, Ponder improves Memory Allocation Quality on average by 71.0% and makespan by 21.8% in comparison to a state-of-the-art method. Moreover, Ponder produces 93.8% fewer task failures.
Distributed, Parallel, and Cluster Computing