Mining Sequential Patterns Across Data Streams

Gong Chen,Xindong Wu,Xingquan Zhu
2005-01-01
Abstract:There are extensive endeavors toward mining frequent items or itemsets in a single data stream, but rare efforts have been made to explore sequential patterns among literals in different data streams. In this paper, we define a challenging problem of mining frequent sequential patterns across multiple data streams. We propose an efficient algorithm MILE to manage the mining process. The proposed algorithm recursively utilizes the knowledge of existing patterns to make new patterns’ mining fast. We also apply a state-of-the-art sequential pattern mining algorithm PrefixSpan which was designed for transaction databases to solve our problem. Extensive empirical results show that MILE is significantly faster than PrefixSpan. One unique feature of our algorithm is when some prior knowledge of the data distribution in the data streams is available, it can be incorporated into the mining process to further improve the performance of MILE. As MILE consumes more memory than PrefixSpan, we also propose a solution to balance the memory usage and time efficiency in memory limited environments.
What problem does this paper attempt to address?