MRI-CE: Minimal rare itemset discovery using the cross-entropy method

Wei Song,Zhen Sun,Philippe Fournier-Viger,Youxi Wu
DOI: https://doi.org/10.1016/j.ins.2024.120392
IF: 8.1
2024-03-08
Information Sciences
Abstract:Rare itemsets have been studied less extensively than frequent itemsets, but have important potential applications in black swan events, like detecting anomalies. Mining rare itemsets poses two challenges: too many results may be obtained, and the process may incur a high computational overhead. To overcome these two challenges, we can attempt to mine minimal rare itemsets (MRIs) and use heuristic methods to mine approximate results instead of exact results. This paper describes a novel algorithm for mining MRIs using cross-entropy (CE). We present the modeling method for MRI-CE and introduce a progressive checking strategy that enables more MRIs to be discovered in each iteration. The discovered MRIs are then used to update a probability vector. We design two optimization strategies to improve the algorithm's performance. The adaptive sample size strategy narrows the search space as the number of iterations increases, and the crossover-based individual generation strategy improves the diversity of the samples. To evaluate the performance of MRI-CE, we select six competitive algorithms and conduct extensive experiments on publicly available datasets. The results show that the proposed algorithm is not only efficient, but also highly accurate. Furthermore, we verify the effectiveness of the two optimization strategies experimentally.
computer science, information systems
What problem does this paper attempt to address?