Mining High Occupancy Itemsets.

Zhi-Hong Deng
DOI: https://doi.org/10.1016/j.future.2019.07.039
IF: 7.307
2019-01-01
Future Generation Computer Systems
Abstract:Frequent itemset mining has been extensively studied in data mining for over the last two decades because of its numerous applications. However, the classic support-based mining framework used by most previous studies is not suitable for some real-world applications, such as the travel landscapes recommendation, where occupancy besides support also plays a key role in evaluating the interestingness of an itemset. In this paper, we propose a new kind of tasks based on occupancy, namely high occupancy mining, by introducing occupancy into the support-based mining framework. An efficient algorithm, HEP (abbreviation for High Efficient algorithm for mining high occupancy itemsets), is developed to discover all high occupancy itemsets. HEP use a structure, named occupancy-list, to store the occupancy information about an itemset and employs an iterative level-wise approach to mine high occupancy itemset via a pruning strategy based on upper bound of occupancy. Substantial experiments on both synthetic and real datasets show that HEP is efficient for mining high occupancy itemsets and is at least one order of magnitude faster than the baseline algorithm.
What problem does this paper attempt to address?