Self-adaptive nonoverlapping sequential pattern mining
Yuehua Wang,Youxi Wu,Yan Li,Fang Yao,Philippe Fournier-Viger,Xindong Wu
DOI: https://doi.org/10.1007/s10489-021-02763-y
IF: 5.3
2021-09-14
Applied Intelligence
Abstract:Repetitive sequential pattern mining (SPM) with gap constraints is a data analysis task that consists of identifying patterns (subsequences) appearing many times in a discrete sequence of symbols or events. By using gap constraints, the user can filter many meaningless patterns, and focus on those that are the most interesting for his needs. However, it is difficult to set appropriate gap constraints without prior knowledge. Hence, users generally find suitable constraints by trial and error, which is time-consuming. Besides, current algorithms are inefficient as they repeatedly check whether the gap constraints are satisfied. To address these problems, this paper presents a complete algorithm called SNP-Miner that has two key phases: candidate pattern generation and support (number of occurrences or occurrence frequency) calculation. To reduce the number of candidate patterns, SNP-Miner employs a pattern join strategy. Moreover, to efficiently calculate the support, SNP-Miner uses an incomplete Nettree structure stored in an array, and scans the structure once to avoid redundant calculations and reduce the time complexity. Experimental results show that SNP-Miner not only outperforms competitive algorithms, but can also discover more valuable patterns without user-predefined gap constraints. Algorithms and data can be downloaded from https://github.com/wuc567/Pattern-Mining/tree/master/SNP-Miner.
computer science, artificial intelligence