Abstract:Discovering frequent itemsets is a data analysis task used in numerous domains. It consists of finding sets of items (itemsets) that frequently appear in a set of database records (also called transactions). Though discovering frequent itemsets is useful, it can produce a large amount of spurious patterns. As a result, the user may spend a great amount of time to analyze the itemsets found by a frequent itemset mining algorithm to find truly interesting patterns. Hence, in recent years, a key research topic has emerged which is to discover statistically significant patterns in databases. The most popular model for identifying itemsets that are statistically significant is to discover non-redundant productive itemsets. The state-of-the-art algorithm to extract this set of patterns is OPUS-Miner. A key drawback of that algorithm is that it is designed to be applied to a static database. Moreover, a second drawback of OPUS-Miner is that it discovers all patterns in a database. In other words, the user cannot search for itemsets containing some specific items. This paper addresses these issues by defining the novel problem of discovering targeted non redundant productive itemsets in dynamic databases. An algorithm named IDPI+ (Interactive Discovery of Productive Itemsets) is presented, storing transactions in a tree structure, which can then be interactively queried to identify productive and non redundant itemsets containing specific items. A structure named Query-Tree is also introduced to process many queries at the same time. Moreover, to handle dynamic databases, efficient transaction insertion and deletion algorithms are provided to update the tree. It was observed in an experimental evaluation on benchmark datasets containing various types of data that IDPI+ can handle thousands of queries per second on a desktop computer. Moreover, it was found that IPDI+ is more than an order of magnitude faster than a baseline algorithm.

Frequent itemset mining with parallel RDBMS

Depth-first Frequent Itemset Mining in Relational Databases

Efficient Frequent Pattern Mining in Relational Databases.

Sql Based Frequent Pattern Mining with Fp-Growth

SQL based frequent pattern mining without candidate generation.

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database.

Processing Sequential Patterns In Relational Databases

Finding Frequent Closed Itemsets in Sliding Window in Linear Time.

Mining Frequent Item Sets by Opportunistic Projection

Research on Frequent Itemsets Mining Algorithm Based on Relational Database.

Research on frequent itemsets mining algorithm based on relational database

Distributed Mining of Frequent Patterns in Big Data by Hybrid Strategies.

A STABLE PARALLEL DISTRIBUTED FREQUENT ITEMSET MINING ALGORITHM AND ITS APPLICATION

Constructing Projection Frequent Pattern Tree for Efficient Mining

Efficiently Mining Frequent Itemsets on Massive Data

Mining Productive Itemsets in Dynamic Databases

Mining Frequent Items in Spatio-temporal Databases

An Efficient Method for the Parallel Mining of Frequent Itemsets in Very Large Text Databases

HPFP-Miner: A Novel Parallel Frequent Itemset Mining Algorithm

Parallel mining of top-k frequent itemsets in very large text database