Parallel Frequent Pattern Discovery: Challenges and Methodology

Zhang Yuzhou,Wang Jianyong,Zhou Lizhu
DOI: https://doi.org/10.1016/s1007-0214(07)70181-5
2007-01-01
Abstract:Parallel frequent pattern discovery algorithms exploit parallel and distributed computing resources to relieve the sequential bottlenecks of current frequent pattern mining (FPM) algorithms. Thus, parallel FPM algorithms achieve better scalability and performance, so they are attracting much attention in the data min- ing research community. This paper presents a comprehensive survey of the state-of-the-art parallel and distributed frequent pattern mining algorithms with more emphasis on pattern discovery from complex data (e.g., sequences and graphs) on various platforms. A review of typical parallel FPM algorithms uncovers the major challenges, methodologies, and research problems in the field of parallel frequent pattern discovery, such as work-load balancing, finding good data layouts, and data decomposition. This survey also indicates a dramatic shift of the research interest in the field from the simple parallel frequent itemset mining on tradi- tional parallel and distributed platforms to parallel pattern mining of more complex data on emerging archi- tectures, such as multi-core systems and the increasingly mature grid infrastructure.
What problem does this paper attempt to address?