Abstract:The High Average Utility Itemset Mining (HAUIM) technique, a variation of High Utility Itemset Mining (HUIM), uses the average utility of the itemsets. Historically, most HAUIM algorithms were designed for static databases. However, practical applications like market basket analysis and business decision-making necessitate regular updates of the database with new transactions. As a result, researchers have developed incremental HAUIM (iHAUIM) algorithms to identify HAUIs in a dynamically updated database. Contrary to conventional methods that begin from scratch, the iHAUIM algorithm facilitates incremental changes and outputs, thereby reducing the cost of discovery. This paper provides a comprehensive review of the state-of-the-art iHAUIM algorithms, analyzing their unique characteristics and advantages. First, we explain the concept of iHAUIM, providing formulas and real-world examples for a more in-depth understanding. Subsequently, we categorize and discuss the key technologies used by varying types of iHAUIM algorithms, encompassing Apriori-based, Tree-based, and Utility-list-based techniques. Moreover, we conduct a critical analysis of each mining method's advantages and disadvantages. In conclusion, we explore potential future directions, research opportunities, and various extensions of the iHAUIM algorithm.
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
This paper aims to solve the challenges in Incremental High Average Utility Itemset Mining (iHAUIM). Specifically, the paper focuses on how to efficiently identify High Average Utility Itemsets (HAUIs) in a dynamically updated database without having to re - process the entire database from scratch.
#### Background and problem description
1. **Limitations of static databases**:
- Traditional High Average Utility Itemset Mining (HAUIM) algorithms are mainly designed for static databases. When the database changes, these algorithms need to re - scan the entire database to update the results, resulting in high computational costs.
2. **Requirements for dynamic databases**:
- In practical applications, such as market basket analysis and business decision support, databases are constantly updated and new transactions are frequently inserted. Therefore, an algorithm that can handle dynamic updates is required to reduce the cost of repeated calculations.
3. **Core problems in iHAUIM**:
- How to efficiently identify high average utility itemsets that meet the minimum utility threshold in the case of dynamic database updates while avoiding unnecessary full - database scans.
#### Solutions and methods
The paper proposes the following solutions:
1. **Incremental update mechanism**:
- Developed multiple incremental update algorithms (such as Apriori - based, tree - structure - based, and utility - list - based methods), which can only process the newly added parts when the database is updated, thereby significantly reducing the computational overhead.
2. **Optimization techniques**:
- Introduced multiple optimization techniques, such as Fast Update (FUP), pre - large concept, upper - bound models, etc., to improve the mining efficiency and reduce the generation of unnecessary candidate itemsets.
3. **Classification and evaluation**:
- Conducted a comprehensive classification and evaluation of existing iHAUIM algorithms, summarized the advantages and disadvantages of various algorithms, and pointed out the direction for future research.
Through these methods, the paper aims to provide a comprehensive framework to help researchers and practitioners better understand and apply incremental high average utility itemset mining techniques, especially in real - time data processing and large - scale dynamic database environments.
### Formula examples
To understand the concept of iHAUIM more clearly, here are several key formulas:
- **Utility of an item**:
\[
u(i_j, T_p)=iu(i_j, T_p)\times eu(i_j)
\]
where \(u(i_j, T_p)\) represents the utility of item \(i_j\) in transaction \(T_p\), \(iu(i_j, T_p)\) is the internal utility, and \(eu(i_j)\) is the external utility.
- **Total utility of a transaction**:
\[
u(T_p)=\sum_{i_j\in T_p}u(i_j, T_p)
\]
- **Total utility of a database**:
\[
tuDB = \sum_{T_p\in DB}u(T_p)
\]
- **Average utility of an itemset**:
\[
au(X, T_p)=\frac{\sum_{i_j\in X}u(i_j, T_p)}{|X|}
\]
- **Average utility of an itemset in a database**:
\[
au(X)=\sum_{T_p\in DB\land X\subseteq T_p}au(X, T_p)
\]
- **Definition of high average utility itemset**:
\[
HAUI = \{X\mid au(X)\geq tuDB\times\delta\}
\]
where \(\delta\) is the minimum average utility threshold set by the user.
Through these formulas, the paper describes in detail how to measure.