Abstract:Drawing the strengths of data science and machine learning, process mining has recently emerged as an effective research approach for process management and its decision support. Bottleneck identification and analysis is a key problem in process mining which is considered a critical component for process improvement. While previous studies focusing on bottlenecks have been reported, visible gaps remain. Most of these studies considered bottleneck identification from local perspectives by quantitative metrics, such as machine operation and resource requirement, which can not be applied to knowledge-intensive processes. Moreover, the root cause of such bottlenecks has not been given enough attention, which limits the impact of process optimisation. This paper proposes an approach that utilises fusion-based clustering and hyperbolic neural network-based knowledge graph embedding for bottleneck identification and root cause analysis. Firstly, a fusion-based clustering is proposed to identify bottlenecks automatically from a global perspective, where the execution frequency of each stage at different periods is calculated to reveal the abnormal stage. Secondly, a process knowledge graph representing tasks, organisations, workforce and relation features as hierarchical and logical patterns is established. Finally, a hyperbolic cluster-based community detection mechanism is researched, based on the process knowledge graph embedding trained by a hyperbolic neural network, to analyse the root cause from a process perspective. Experimental studies using real-world data collected from a multidisciplinary design project revealed the merits of the proposed approach. The execution of the proposed approach is not limited to event logs; it can automatically identify bottlenecks without local quantitative metrics and analyse the causes from a process perspective.

Data Mining Based Root-Cause Analysis of Performance Bottleneck for Big Data Workload.

BigRoots: an Effective Approach for Root-cause Analysis of Stragglers in Big Data System

Straggler Root-Cause and Impact Analysis for Massive-scale Virtualized Cloud Datacenters.

Log-based Abnormal Task Detection and Root Cause Analysis for Spark

Reducing Late-Timing Failure at Scale: Straggler Root-Cause Analysis in Cloud Datacenters

ML-NA: A Machine Learning Based Node Performance Analyzer Utilizing Straggler Statistics

Effective Straggler Mitigation with Cross-Layer Interference-Aware Optimization

A Holistic Cross-Layer Optimization Approach for Mitigating Stragglers in In-Memory Data Processing

A Fine-Grained Performance Bottleneck Analysis Method for HDFS.

Cheetah: A Dynamic Performance Optimization Approach on Heterogeneous Big Data Analytics Cluster

LADRA: Log-based Abnormal Task Detection and Root-Cause Analysis in Big Data Processing with Spark.

HybridTune: Spatio-temporal Data and Model Driven Performance Diagnosis for Big Data Systems

A Comprehensive Inspection of the Straggler Problem

Performance Modeling and Prediction of Big Data Workflows: an Exploratory Analysis.

Towards Low-Latency Batched Stream Processing by Pre-Scheduling

Learning-Based Characterizing and Modeling Performance Bottlenecks of Big Data Workloads

Process Bottlenecks Identification and Its Root Cause Analysis Using Fusion-Based Clustering and Knowledge Graph

Bottleneck-Aware Task Scheduling Based on Per-Stage and Multi-ML Profiling

A Stack-Centric Processing Model for Iterative Processing

Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel Applications

Distributed Data Mining For Root Causes Of Kpi Faults In Wireless Networks