Efficiently Mining Closed Subsequences with Gap Constraints

Chun Li,Jianyong Wang
DOI: https://doi.org/10.1137/1.9781611972788.28
2008-01-01
Abstract:Mining frequent subsequence patterns from sequence databases is a typical data mining problem and various efficient sequential pattern mining algorithms have been proposed. In many problem domains (e.g, biology), the frequent subsequences confined by the predefined gap requirements are more meaningful than the general sequential patterns. In this paper we re-examine the closed sequential pattern mining problem by introducing the gap constraints. The most challenging parts in this task include the constrained pattern closure checking and unpromising search space pruning. Inspired by some state-of-the-art closed or constrained sequential pattern mining algorithms, we propose an efficient approach to finding the complete set of closed sequential patterns with gap constraints. The approach combines the newly devised constrained pattern closure checking scheme and pruning techniques with the pattern growth based subsequence enumeration framework. Our extensive performance study shows that our approach is very efficient in mining frequent closed subsequences with gap constraints.
What problem does this paper attempt to address?