Abstract:Automated process discovery from event logs is a key component of process mining, allowing companies to acquire meaningful insights into their business processes. Despite significant research, present methods struggle to balance important quality dimensions: fitness, precision, generalization, and complexity, but is limited when dealing with complex loop structures. This paper introduces Bonita Miner, a novel approach to process model discovery that generates behaviorally accurate Business Process Model and Notation (BPMN) diagrams. Bonita Miner incorporates an advanced filtering mechanism for Directly Follows Graphs (DFGs) alongside innovative algorithms designed to capture concurrency, splits, and loops, effectively addressing limitations of balancing as much as possible these four metrics, either there exists a loop, which challenge in existing works. Our approach produces models that are simpler and more reflective of the behavior of real-world processes, including complex loop dynamics. Empirical evaluations using real-world event logs demonstrate that Bonita Miner outperforms existing methods in fitness, precision, and generalization, while maintaining low model complexity.
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
This paper aims to solve several key challenges encountered when automatically discovering business process models from event logs, especially when dealing with processes containing complex loop structures. Specifically, the paper attempts to solve the following problems:
1. **Balancing four quality dimensions**:
- **Fitness**: The model should accurately reflect the behaviors in the event log.
- **Precision**: The model should not include behaviors irrelevant to the actual process.
- **Generalization**: The model should be able to cover behaviors that do not appear in the log but may belong to the process.
- **Complexity**: The model should be as simple as possible and easy to understand and maintain.
2. **Handling complex loop structures**:
Existing methods perform poorly when dealing with processes containing complex loop structures. For example, existing algorithms cannot accurately identify and represent loops containing parallel blocks (as shown in Figure 1b). These algorithms usually wrongly represent parallel blocks as separate self - loops (as shown in Figure 1a), thus affecting the accuracy of the model.
3. **Generating concise and accurate BPMN diagrams**:
The paper proposes a new method - Bonita Miner, which can generate behaviorally accurate BPMN diagrams and performs well when dealing with complex loop structures. Bonita Miner effectively solves the limitations of existing methods in these four quality dimensions by introducing advanced filtering mechanisms and innovative algorithms to capture concurrency, branching, and looping.
### Overview of the solution
The main improvements of Bonita Miner include:
- **Depth - First Algorithm (DFA)**: Used to construct split and merge nodes, ensuring the simplicity and accuracy of the model when dealing with complex loop structures.
- **Advanced filtering of the Directly - Follows - Graph (DFG)**: Removing unnecessary concurrency relationships and loops, simplifying the DFG to better identify and handle complex process structures.
- **Treating loops as blocks**: Treating loops as blocks with one or more sources and targets, thereby reducing unnecessary gateways and relationships and further simplifying the model.
Through these improvements, Bonita Miner can maintain high fitness, precision, and generalization ability while dealing with complex loop structures, and at the same time reduce the complexity of the model. Experimental results show that Bonita Miner outperforms the three existing state - of - the - art baseline methods when dealing with real - world event logs.
### Formula presentation
To ensure the correctness and readability of the formulas, the following are some formula examples involved in the paper:
- **Definition of the Directly - Follows - Graph (DFG)**:
\[
DFG=(V, E)
\]
where:
- \( V \) represents a finite set of vertices, each vertex corresponding to an activity or a start/end event.
- \( E\subseteq V\times V \) represents a set of directed edges, and each edge \((u, v)\in E \) means that \( v \) can be executed immediately after \( u \).
- **Definition of Path**:
\[
P_{u,v}=\langle (v_1, v_2),\ldots,(v_{n - 1}, v_n)\rangle
\]
representing the unique sequence of edges from \( u \) to \( v \), where:
- \(\forall 1\leq i, j\leq n - 1, i\neq j, n\geq 2\)
- \((v_i, v_{i + 1})\in E\)
- \(v_1 = u\), \(v_n = v\)
- **Definition of Cycle**:
\[
C_u=\langle (u, v_1),\ldots,(v_n, u)\rangle
\]
representing the path starting and ending at \( u \), where:
- \(\forall 1\leq