Applying Process Mining on Scientific Workflows: a Case Study

Zahra Sadeghibogar,Alessandro Berti,Marco Pegoraro,Wil M.P. van der Aalst
2023-07-06
Abstract:Computer-based scientific experiments are becoming increasingly data-intensive. High-Performance Computing (HPC) clusters are ideal for executing large scientific experiment workflows. Executing large scientific workflows in an HPC cluster leads to complex flows of data and control within the system, which are difficult to analyze. This paper presents a case study where process mining is applied to logs extracted from SLURM-based HPC clusters, in order to document the running workflows and find the performance bottlenecks. The challenge lies in correlating the jobs recorded in the system to enable the application of mainstream process mining techniques. Users may submit jobs with explicit or implicit interdependencies, leading to the consideration of different event correlation techniques. We present a log extraction technique from SLURM clusters, completed with an experimental.
Databases
What problem does this paper attempt to address?