Abstract:High utility itemsets are sets of items that have a high utility (e.g. a high profit or a high importance) in a transaction database. Discovering high utility itemsets has many important applications in real-life such as market basket analysis. Nonetheless, mining these patterns is a time-consuming process due to the huge search space and the high cost of utility computation. Most of previous work is devoted to search space pruning but pay little attention to utility computation. Factually, not only search space pruning but also high utility itemset identification have to resort to the computation of various utilities. This paper proposes a novel algorithm named REX (Rapid itEmset eXtraction), which extends the classic d<math>2</math>HUP algorithm with an improved structure, a <math>k</math>-item utility machine, and an efficient switch strategy. The structure can significantly reduce the time complexity of utility computation compared with the original structure used in d<math>2</math>HUP. The machine can quickly merge identical transactions and applies an efficient procedure for computing the utilities of extensions of a given itemset. The strategy derived from trial and error drastically gives rise to performance improvement on some databases and is also competitive with the switch strategy used in d<math>2</math>HUP on other databases. Experimental results show that REX achieves a speedup of from fifty percent to three orders of magnitude over d<math>2</math>HUP even though they use identical pruning techniques and that REX considerably outperforms state-of-the-art algorithms on real-life and synthetic databases.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily focuses on the utility computation problem in High Utility Itemset (HUI) mining. Specifically, the paper proposes a new algorithm named REX (Rapid Itemset Extraction), which aims to improve the efficiency of high utility itemset mining by enhancing data structures and utility computation strategies. #### Main Issues: 1. **Huge Search Space**: In a transaction database, a database containing n items may have up to 2^n different itemsets, leading to a massive search space. 2. **High Utility Computation Cost**: Calculating the utility of each itemset is a very time-consuming process because it requires traversing all transactions that contain the itemset. #### Solutions: 1. **Extended Chain Structure**: The REX algorithm improves upon the chain structure of the d2HUP algorithm by introducing a new Extended Chain Structure, which can significantly reduce the time complexity of utility computation. 2. **k-item Utility Machine**: A k-item Utility Machine is proposed, which assigns a unique integer to each transaction, quickly merging identical transactions to accelerate utility computation. 3. **Efficient Switch Strategy**: By using a trial-and-error method, an efficient Switch Strategy is found, allowing the algorithm to perform excellently on certain databases while remaining competitive on others. ### Summary The paper improves the performance of high utility itemset mining algorithms by enhancing utility computation methods, particularly excelling in databases with long transactions. Experimental results show that the REX algorithm significantly outperforms d2HUP and other existing algorithms on standard benchmark databases and demonstrates good adaptability.

Mining high utility itemsets using extended chain structure and utility machine

Efficient High-utility Itemset Mining Based on a Novel Data Structure

An Efficient Structure for Fast Mining High Utility Itemsets

Beyond Frequency: Utility Mining with Varied Item-Specific Minimum Utility

An efficient mining scheme for high utility itemsets

FHUQI-Miner: Fast high utility quantitative itemset mining

TOPIC: Top-k High-Utility Itemset Discovering

Re-induction based mining for high utility item-sets

On-shelf Utility Mining from Transaction Database.

OSUMI: On-Shelf Utility Mining from Itemset-based Data

UBP-Miner: An efficient bit based high utility itemset mining algorithm

FUIM: Fuzzy Utility Itemset Mining

High-utility itemset mining for subadditive monotone utility functions

Mining summarization of high utility itemsets

IPHM: Incremental periodic high-utility mining algorithm in dynamic and evolving data environments

Itemset Utility Maximization with Correlation Measure

An Efficient Data Structure for Fast Mining High Utility Itemsets

Fast Utility Mining on Complex Sequences

Incremental high average-utility itemset mining: survey and challenges

Utility Mining Across Multi-Sequences with Individualized Thresholds

A Survey of High-utility Itemsets Mining