Efficient Skyline Frequent-Utility Itemset Mining Algorithm on Massive Data

Jingxuan He,Xixian Han,Xiaolong Wan,Jinbao Wang
DOI: https://doi.org/10.1109/tkde.2024.3349454
IF: 9.235
2024-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Frequent itemset mining (FIM) and high-utility itemset mining (HUIM) are two important branches of itemset mining which is a key technology of knowledge discovery in many applications. Nowadays, there have been extensive algorithms on FIM and HUIM, but few studies consider frequency and utility together, so skyline frequent-utility itemset mining (SFUIM) is proposed to find useful itemsets with both frequency and utility measurements. Nevertheless, SFUIM is more challenging than FIM and HUIM since the search space is large and the calculation cost is expensive without any threshold, especially on large-scale databases. To address it, this paper proposes a novel prefix-based algorithm PSI* to mine skyline frequent-utility itemsets on massive data. PSI* divides the huge database by prefix-based partitioning, so that the calculation of itemsets with a specific prefix-item only involves a partition instead of the database. A multilevel-index based list is presented to compactly maintain the maximal utility under the frequency constraint, and a novel grid-based structure is devised to organize partitions or items by a designed order. Moreover, four efficient pruning strategies are proposed to prune itemsets as early as possible. Substantial experiments show that the PSI* algorithm has better performance than the state-of-the-art algorithms, obviously on large-scale databases.
What problem does this paper attempt to address?