GPU-Based Efficient Parallel Heuristic Algorithm for High-Utility Itemset Mining in Large Transaction Datasets

Wei Fang,Haipeng Jiang,Hengyang Lu,Jun Sun,Xiaojun Wu,Jerry Chun-Wei Lin
DOI: https://doi.org/10.1109/tkde.2023.3290371
IF: 9.235
2023-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Heuristic algorithms have been developed to find approximate solutions for high-utility itemset mining (HUIM) problems that compensate for the performance bottlenecks of exact algorithms. However, heuristic algorithms still face the problem of long runtime and insufficient mining quality, especially for large transaction datasets with thousands to tens of thousands of items and up to millions of transactions. To solve these problems, a novel GPU-based efficient parallel heuristic algorithm for HUIM (PHA-HUIM) is proposed in this paper. The iterative process of PHA-HUIM consists of three main steps: the search strategy, fitness evaluation, and ring topology communication. The search strategy and ring topology communication are designed to run in constant time on GPU. The parallelism of fitness evolution helps to substantially accelerate the algorithm. A new data structure with a sort-mapping strategy is proposed to enhance the search ability and reduce memory usage. To improve the mining quality, a multi-start strategy with an unbalanced allocation strategy is employed in the search process. Ring topology communication is adopted to maintain population diversity. A load balancing strategy is introduced to reduce the thread divergence to improve the parallel efficiency. The experimental results on nine large datasets show that PHA-HUIM outperforms state-of-the-art HUIM algorithms in terms of speedup performance, runtime, and mining quality.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?