Explaining the impact of design choices on model quality in predictive process monitoring

Sungkyu Kim,Marco Comuzzi,Chiara Di Francescomarino
DOI: https://doi.org/10.1007/s10844-024-00903-7
2024-11-02
Journal of Intelligent Information Systems
Abstract:When developing a Predictive process monitoring (PPM) model, designers have several design choices, encompassing both ML-related concerns, such as which classification or regression model to choose, and PPM-specific concerns, such as how to encode the trace prefixes and which features to generate using the event timestamps. While the literature has seen a few attempts to study how these choices impact the performance of a PPM model, no systematic studies on this matter exist. This paper aims at closing this gap. Instead of devising a systematic experimental benchmark study, however, we propose a framework that could be instantiated differently depending on the PPM task at hand and other settings. To interpret the impact of design choices on the performance of a PPM model, the framework considers as building blocks a user-defined design space exploration strategy and explainable Artificial Intelligence techniques, like SHAP, to analyze the impact of design choices on the model performance based on the generated configurations and the performance that they achieved. We present two instantiations of the proposed framework for the two fundamental PPM tasks of next activity and outcome prediction. The results obtained using publicly available event logs are used to derive both general insights regarding the effectiveness of design choices and specific insights based on the characteristics of the event logs used.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?