Global Sequential Pattern Mining in Distributed Environment

HU Kong-fa,ZHANG Chang-hai,CHEN Ling,SONG Ai-bo,DA Qing-li
DOI: https://doi.org/10.3969/j.issn.1006-5911.2007.11.024
2007-01-01
Computer Integrated Manufacturing Systems
Abstract:There were too many candidate sequences generated from sequential pattern mining algorithms in distributed environment which led to communication overhead.To deal with this problem,a new algorithm,Fast Mining of Global Sequential Pattern(FMGSP) in distributed system was proposed.The core idea of this algorithm was to compress local frequent sequential patterns into the corresponding lexicographic sequence tree so as to avoid transmission of repeated prefixes.Based on the regular and simple sequences of merged trees,a new pruning method named Item Extension and Sequence Extension(I/S-E) pruning was presented to prune candidate sequences effectively.Therefore,communication overhead was significantly reduced and global sequential patterns were generated quickly.Theories and experiments showed that the performance of FMGSP was superior,and it was effective specially in mining global sequential patterns for huge amount of data.
What problem does this paper attempt to address?