Mining Sequential Patterns in Uncertain Databases Using Hierarchical Index Structure

Kashob Kumar Roy,Md Hasibul Haque Moon,Md Mahmudur Rahman,Chowdhury Farhan Ahmed,Carson K. Leung
2024-04-01
Abstract:In this uncertain world, data uncertainty is inherent in many applications and its importance is growing drastically due to the rapid development of modern technologies. Nowadays, researchers have paid more attention to mine patterns in uncertain databases. A few recent works attempt to mine frequent uncertain sequential patterns. Despite their success, they are incompetent to reduce the number of false-positive pattern generation in their mining process and maintain the patterns efficiently. In this paper, we propose multiple theoretically tightened pruning upper bounds that remarkably reduce the mining space. A novel hierarchical structure is introduced to maintain the patterns in a space-efficient way. Afterward, we develop a versatile framework for mining uncertain sequential patterns that can effectively handle weight constraints as well. Besides, with the advent of incremental uncertain databases, existing works are not scalable. There exist several incremental sequential pattern mining algorithms, but they are limited to mine in precise databases. Therefore, we propose a new technique to adapt our framework to mine patterns when the database is incremental. Finally, we conduct extensive experiments on several real-life datasets and show the efficacy of our framework in different applications.
Databases
What problem does this paper attempt to address?
This paper attempts to solve several key problems encountered in mining sequential patterns in uncertain databases. Specifically, these problems include: 1. **Reducing the Generation of False - positive Patterns**: Existing methods generate a large number of false - positive patterns (i.e., patterns that do not meet the conditions but are misidentified as frequent) during the mining process, which leads to unnecessary computational overhead and resource waste. 2. **Efficiently Maintaining Candidate Patterns**: Existing methods are less efficient in maintaining candidate patterns, resulting in a high cost for support - degree calculation and affecting the overall performance. 3. **Lack of an Effective Weight Upper Limit**: For the mining of weighted patterns, existing methods lack an upper - limit measure that can effectively handle weights while maintaining the anti - monotonic property. 4. **Scalability Issues of Incremental Databases**: With the development of modern technology, most databases are dynamic and incremental. However, existing uncertain sequential pattern mining algorithms cannot effectively handle this dynamic characteristic, and it is not practical to rerun batch - processing algorithms after each increment. To solve the above problems, the author proposes a new framework, which includes the following improvements: - **Theoretically Tightened Pruning Upper Limits**: Three theoretically stricter upper limits (`expSupcap`, `wgtcap`, `wExpSupcap`) are proposed to significantly reduce the mining space, thereby reducing the generation of false - positives. - **Hierarchical Index Structure**: A novel hierarchical index structure `USeq - Trie` is introduced to maintain patterns more efficiently. - **Fast Support - degree Calculation Method**: A faster method `SupCalc` is developed to calculate the expected support - degree of patterns. - **Efficient Uncertain Sequential Pattern Mining Algorithm**: An efficient algorithm named `FUSP` is proposed for mining sequential patterns in uncertain databases. - **Incremental Mining Method**: For incremental databases, a new technique `InUSP` is proposed, which can effectively mine patterns in the case of database increments and improve the mining efficiency and the integrity of the results by introducing Promising Frequent Sequences (PFS). Through these improvements, this paper aims to provide a more efficient, accurate and applicable method for mining uncertain sequential patterns in incremental databases.