SeqGen: Mining sequential generator patterns from sequence databases

Shengwei Yi,Tianheng Zhao,Yuanyuan Zhang,Shilong Ma,Jie Yin,Zhanbin Che
DOI: https://doi.org/10.1166/asl.2012.3008
2012-01-01
Advanced Science Letters
Abstract:With the wide application of data mining techniques, discovering significant sequential patterns attracts more and more attention in data mining, biological information and financial analysis communities in recent years. Due to the downward closure property, recent researches of mining frequent sequential patterns focus on mining maximal sequential patterns, closed sequential patterns and sequential generators instead of mining complete frequent sequential patterns. Although above three methods can compress the results of frequent sequential patterns, the maximal sequential patterns is a lossy compression. Sequential closed patterns are inferior to sequential generator patterns in classification and model selection. However, existing algorithms of mining sequential generators spend too much time in redundant search spaces because the relationship between a sequence and its subsequences fails to be considered. In this paper, a novel depth-first-search-based algorithm is designed to mining sequential generators effectively. In order to avoid redundant search and computation, we further propose the safe pruning strategy and the fast sequential generator checking mechanism. In our experiments, a comprehensive performance study has been carried out on both real world datasets and synthetic datasets. The experimental results show that the proposed methods have low time-consuming cost and high scalability. © 2012 American Scientific Publishers. All rights reserved.
What problem does this paper attempt to address?