Abstract:Process mining gains increasing popularity in business process analysis, also in heavy industry. It requires a specific data format called an event log, with the basic structure including a case identifier (case ID), activity (event) name, and timestamp. In the case of industrial processes, data is very often provided by a monitoring system as time series of low level sensor readings. This data cannot be directly used for process mining since there is no explicit marking of activities in the event log, and sometimes, case ID is not provided. We propose a novel rule-based algorithm for identification patterns, based on the identification of significant changes in short-term mean values of selected variable to detect case ID. We present our solution on the mining use case. We compare computed results (identified patterns) with expert labels of the same dataset. Experiments show that the developed algorithm in the most of the cases correctly detects IDs in datasets with and without outliers reaching F1 score values: 96.8% and 97% respectively. We also evaluate our algorithm on dataset from manufacturing domain reaching value 92.6% for F1 score.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to identify the case ID (case identifier) required for event logs from time - series sensor data** so that process mining (PM) can be carried out. Specifically, the raw sensor data generated in industrial processes is usually provided in the form of low - level time - series, lacking clear activity labels and case IDs, which makes this data unable to be directly used for process mining. Therefore, a method needs to be developed to automatically detect case IDs from such raw sensor data.
### Problem Background
1. **Requirements for Process Mining**: Process mining is a technique for discovering, monitoring, and improving business processes by analyzing event logs. Event logs usually contain three basic elements: case ID, activity name, and timestamp. However, industrial sensor data is usually unstructured and lacks these necessary elements.
2. **Characteristics of Sensor Data**: Sensor data in industrial processes is usually time - series data, recording the states of machines or devices at different time points. This data has no clear activity labels and no case ID, so it cannot be directly used for process mining.
3. **Limitations of Existing Methods**: Most of the existing event - association methods assume that defined events or activities already exist in the data or rely on known process models. However, when dealing with raw sensor data, these assumptions are often not valid, making it difficult to apply existing methods.
### Main Contributions of the Paper
To solve the above problems, the author proposes a case ID detection algorithm based on time - series patterns. This algorithm detects case IDs by identifying significant changes in time - series data and is especially suitable for industrially processes that operate periodically. Specifically:
- **Algorithm Principle**: The algorithm identifies patterns based on the short - term statistical characteristics of time - series (such as changes in the mean) and maps them to case IDs.
- **Application Scenario**: The author takes the long - wall coal mining machine operation in the mining industry as an example to demonstrate the effectiveness of this algorithm.
- **Experimental Results**: By comparing with the data labeled by experts, the experimental results show that the algorithm achieves F1 scores of 96.8% and 97% on the raw data and the cleaned data respectively, indicating its high accuracy.
### Summary
The core problem of this paper is to develop a method that can automatically detect case IDs in raw sensor data, so that this data can be used for process mining. The algorithm based on time - series patterns proposed by the author has shown good performance on multiple data sets, especially having a significant advantage when dealing with periodic industrial processes.