Mining diverse sets of patterns with constraint programming using the pairwise Jaccard similarity relaxation

Arnold Hien,Noureddine Aribi,Samir Loudni,Yahia Lebbah,Abdelkader Ouali,Albrecht Zimmermann
DOI: https://doi.org/10.1007/s10601-024-09373-8
2024-10-03
Constraints
Abstract:In recent years, pattern mining has evolved from a slow-moving, repetitive three-step process to a much more agile and iterative/user-centric mining model. A crucial element of this framework is the capability to rapidly provide a set of diverse patterns to the user. This paper proposes a pattern mining approach based on constraint programming that incorporates a non-redundancy/diversity constraint into closed pattern enumeration. The level of diversity is controlled through a threshold on the maximum pairwise Jaccard similarity of pattern occurrences. We show that the Jaccard measure does not have nice (anti-)monotonicity properties w.r.t. the general-to-specific enumeration. To address this limitation, we propose anti-monotonic lower and upper-bound relaxations of the Jaccard similarity with nice pruning-enabling properties, and connect the final results to the original Jaccard Index. To evaluate the effectiveness of our relaxations, we conduct a comprehensive comparison against several existing pattern mining techniques designed to control redundancy. Experimental results illustrate that our approach provides an effective solution for mining diverse itemsets, showing competitive performance in both runtime and flexibility.
computer science, artificial intelligence, theory & methods
What problem does this paper attempt to address?