Mining Interesting Sequential Patterns using a Novel Balanced Utility Measure
Hai Duong,Tin Truong,Bac Le,Philippe Fournier-Viger
DOI: https://doi.org/10.1016/j.knosys.2024.111796
IF: 8.139
2024-04-19
Knowledge-Based Systems
Abstract:High utility sequential pattern (HUSP) mining (HUSM) is an emerging task in data mining. The goal is to identify sequential patterns in a quantitative sequence database that have high importance, as measured by a utility function. Nevertheless, a limitation of HUSM is that a pattern may appear multiple times in an input sequence, and as a consequence, the utility of a pattern may be calculated in many different ways. Until now, most studies on HUSM have focused on two utility functions, called the maximum and minimum utility, which define the utility of a pattern in a sequence as the largest or smallest value, respectively. However, these two functions are two extremes, that is, they represent the best and worst cases. This is unsuitable for many practical situations, such as business decision-making, where overestimating or underestimating the utility can be very risky. To avoid these extremes, this paper introduces a novel utility function u ̄ , called balanced utility. It allows evaluating the importance of a pattern based on the average of its occurrences in a sequence. To efficiently mine HUSPs with u ̄ , two novel upper bounds (UBs) and a weak UB on u ̄ are developed. These bounds are utilized as a theoretical basis for designing new pruning strategies, which are integrated with an ESUL structure in a novel algorithm named MISP-BU, for efficiently mining frequent HUSPs with u ̄ . Extensive experiments have confirmed that MISP-BU is highly efficient in terms of execution time, memory usage, and scalability.
computer science, artificial intelligence