Flow-Bench: A Dataset for Computational Workflow Anomaly Detection

George Papadimitriou,Hongwei Jin,Cong Wang,Rajiv Mayani,Krishnan Raghavan,Anirban Mandal,Prasanna Balaprakash,Ewa Deelman
2024-06-14
Abstract:A computational workflow, also known as workflow, consists of tasks that must be executed in a specific order to attain a specific goal. Often, in fields such as biology, chemistry, physics, and data science, among others, these workflows are complex and are executed in large-scale, distributed, and heterogeneous computing environments prone to failures and performance degradation. Therefore, anomaly detection for workflows is an important paradigm that aims to identify unexpected behavior or errors in workflow execution. This crucial task to improve the reliability of workflow executions can be further assisted by machine learning-based techniques. However, such application is limited, in large part, due to the lack of open datasets and benchmarking. To address this gap, we make the following contributions in this paper: (1) we systematically inject anomalies and collect raw execution logs from workflows executing on distributed infrastructures; (2) we summarize the statistics of new datasets, and provide insightful analyses; (3) we convert workflows into tabular, graph and text data, and benchmark with supervised and unsupervised anomaly detection techniques correspondingly. The presented dataset and benchmarks allow examining the effectiveness and efficiency of scientific computational workflows and identifying potential research opportunities for improvement and generalization. The dataset and benchmark code are publicly available \url{<a class="link-external link-https" href="https://poseidon-workflows.github.io/FlowBench/" rel="external noopener nofollow">this https URL</a>} under the MIT License.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The paper focuses on the problem of anomaly detection in computing workflows (also known as processes) executed in a distributed computing environment. These workflows are common in fields such as biology, chemistry, physics, and data science, and they are susceptible to hardware and software failures that result in performance degradation and errors as they increase in scale and complexity. Anomaly detection is a key approach for identifying abnormal behavior or errors in workflow execution, which can improve reliability and efficiency. The main contributions of the paper are as follows: 1. Systematically injecting anomalies and collecting raw execution logs. 2. Performing statistical analysis and gaining deep insights into the new dataset. 3. Transforming workflows into tabular, graphic, and textual data and benchmarking them using supervised and unsupervised anomaly detection techniques. The paper also points out that despite the existence of anomaly detection techniques, there is a lack of open datasets and benchmarks that restrict their application in workflows. To address this issue, the paper provides new workflow datasets and benchmarks for evaluating the performance of scientific computing workflows and identifying research opportunities for improvement and generalization. The datasets and benchmark code are publicly available under the MIT license. Furthermore, the paper introduces 12 representative workflows from different domains that are used to create the dataset. These workflows are evaluated using various anomaly detection techniques for tabular, graphic, and textual data. By comparing the performance of different techniques, the paper reveals their applicability and limitations in scientific workflow scenarios. In conclusion, the paper aims to advance the development of anomaly detection techniques in computing workflows by providing datasets and benchmarks, encouraging future research, and tool development.