A Conceptual Framework for Understanding Event Data Quality for Behavior Analysis

Xixi Lu,Dirk Fahland
Abstract:Process mining aims to derive useful insight for improving business process efficiency and effectiveness. These mining techniques rely heavily on event data, in the form of event logs, to provide accurate diagnostic information. The quality of such event data therefore has a large effect on the quality and trustworthiness of the conclusions drawn from the mining analysis and the subsequent business decisions made. Traditional data quality frameworks focus on identifying quality dimensions extensively from a data perspective and improving the overall data quality in the long term. While long-term data quality improvement is certainly useful, this may not aid analysts in practice who are often faced with the task of analyzing a given log of lower quality in the short term. As result, when the user conducts a certain analysis (e.g., process discovery), these quality frameworks provide little guidance for assessing or improving the quality of data for the analysis [1,2,7]. To the best of our knowledge, only the work in [7] presented event data quality issues as specific patterns reoccurring in logs and discussed their possible effects on mining results from an analysis perspective. In the past few years, we have developed numerous approaches to deal with event logs of low quality, for which no conclusive results are obtained when the user applies existing mining techniques. Three main approaches have emerged: (i) a trace clustering technique based on behavior similarity which allows the user to identify process variants and then explore these variants to discover more precise and conclusive models [4]; (ii) a conformance checking technique using partial order traces and alignments should the ordering of events in a log be untrustworthy [3]; (iii) a label refinement technique in cases where labels of events are imprecise and lead to inconclusive models [5]. However, as each approach is dedicated to tackle a particular event data quality issue from an analysis perspective, an overview for understanding the quality issues is missing. In this positioning paper, we would like to discuss a conceptual framework to help users understand how these quality issues could be presented and interrelated, how our approaches may be positioned and how future data quality issues may be classified. The conceptual framework1 is visualized as a table: the columns
What problem does this paper attempt to address?