Mining high utility itemsets using extended chain structure and utility machine

Jun-Feng Qu,Philippe Fournier-Viger,Mengchi Liu,Bo Hang,Feng Wang
DOI: https://doi.org/10.1016/j.knosys.2020.106457
2020-11-01
Abstract:<p>High utility itemsets are sets of items that have a high utility (e.g. a high profit or a high importance) in a transaction database. Discovering high utility itemsets has many important applications in real-life such as market basket analysis. Nonetheless, mining these patterns is a time-consuming process due to the huge search space and the high cost of utility computation. Most of previous work is devoted to search space pruning but pay little attention to utility computation. Factually, not only search space pruning but also high utility itemset identification have to resort to the computation of various utilities. This paper proposes a novel algorithm named REX (Rapid itEmset eXtraction), which extends the classic d<span class="math"><math>2</math></span>HUP algorithm with an improved structure, a <span class="math"><math>k</math></span>-item utility machine, and an efficient switch strategy. The structure can significantly reduce the time complexity of utility computation compared with the original structure used in d<span class="math"><math>2</math></span>HUP. The machine can quickly merge identical transactions and applies an efficient procedure for computing the utilities of extensions of a given itemset. The strategy derived from trial and error drastically gives rise to performance improvement on some databases and is also competitive with the switch strategy used in d<span class="math"><math>2</math></span>HUP on other databases. Experimental results show that REX achieves a speedup of from fifty percent to three orders of magnitude over d<span class="math"><math>2</math></span>HUP even though they use identical pruning techniques and that REX considerably outperforms state-of-the-art algorithms on real-life and synthetic databases.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily focuses on the utility computation problem in High Utility Itemset (HUI) mining. Specifically, the paper proposes a new algorithm named REX (Rapid Itemset Extraction), which aims to improve the efficiency of high utility itemset mining by enhancing data structures and utility computation strategies. #### Main Issues: 1. **Huge Search Space**: In a transaction database, a database containing n items may have up to 2^n different itemsets, leading to a massive search space. 2. **High Utility Computation Cost**: Calculating the utility of each itemset is a very time-consuming process because it requires traversing all transactions that contain the itemset. #### Solutions: 1. **Extended Chain Structure**: The REX algorithm improves upon the chain structure of the d2HUP algorithm by introducing a new Extended Chain Structure, which can significantly reduce the time complexity of utility computation. 2. **k-item Utility Machine**: A k-item Utility Machine is proposed, which assigns a unique integer to each transaction, quickly merging identical transactions to accelerate utility computation. 3. **Efficient Switch Strategy**: By using a trial-and-error method, an efficient Switch Strategy is found, allowing the algorithm to perform excellently on certain databases while remaining competitive on others. ### Summary The paper improves the performance of high utility itemset mining algorithms by enhancing utility computation methods, particularly excelling in databases with long transactions. Experimental results show that the REX algorithm significantly outperforms d2HUP and other existing algorithms on standard benchmark databases and demonstrates good adaptability.