TaSPM: Targeted Sequential Pattern Mining

Gengsen Huang,Wensheng Gan,Philip S. Yu
DOI: https://doi.org/10.1145/3639827
IF: 4.157
2024-02-28
ACM Transactions on Knowledge Discovery from Data
Abstract:Sequential pattern mining (SPM) is an important technique in the field of pattern mining, which has many applications in reality. Although many efficient SPM algorithms have been proposed, there are few studies that can focus on targeted tasks. Targeted querying of the concerned sequential patterns can not only reduce the number of patterns generated, but also increase the efficiency of users in performing related analysis. The current algorithms available for targeted sequence querying are based on specific scenarios and can not be extended to other applications. In this article, we formulate the problem of targeted sequential pattern mining and propose a generic algorithm, namely TaSPM. What is more, to improve the efficiency of TaSPM on large-scale datasets and multiple-item-based sequence datasets, we propose several pruning strategies to reduce meaningless operations in the mining process. Totally four pruning strategies are designed in TaSPM, and hence TaSPM can terminate unnecessary pattern extensions quickly and achieve better performance. Finally, we conducted extensive experiments on different datasets to compare the baseline SPM algorithm with TaSPM. Experiments show that the novel targeted mining algorithm TaSPM can achieve faster running time and less memory consumption.
computer science, information systems, software engineering
What problem does this paper attempt to address?