Abstract:Frequent itemset mining is a popular and important first step in the analysis of data arising in a broad range of applications. The traditional “exact” model for frequent itemsets requires that every item occur in each supporting transaction. However, real data is typically subject to noise and measurement error. To date, the effect of noise on exact frequent pattern mining algorithms have been addressed primarily through simulation studies, and there has been limited attention to the development of noise tolerant algorithms. In this paper we propose a noise tolerant itemset model, which we call approximate frequent itemsets (AFI). Like frequent itemsets, the AFI model requires that an itemset has a minimum number of supporting transactions. However, the AFI model tolerates a controlled fraction of errors in each item and each supporting transaction. Motivating this model are theoretical results (and a supporting simulation study presented here) which state that, in the presence of even low levels of noise, large frequent itemsets are broken into fragments of logarithmic size; thus the itemsets cannot be recovered by a routine application of frequent itemset mining. By contrast, we provide theoretical results showing that the AFI criterion is well suited to recovery of block structures subject to noise. We developed and implemented an algorithm to mine AFIs that generalizes the level-wise enumeration of frequent itemsets by allowing noise. We propose the noise-tolerant support threshold, a relaxed version of support, which varies with the length of the itemset and the noise threshold. We exhibit an Apriori property that permits the pruning of an itemset if any of its sub-itemset is not sufficiently supported. Several experiments presented demonstrate that the AFI algorithm enables better recoverability of frequent patterns under noisy conditions than existing frequent itemset mining approaches. Noise-tolerant support pruning also renders an order of magnitude performance gain over existing methods.

An Incremental Algorithm For Frequent Itemset Mining On Spark

A New Algorithm for Mining Global Frequent Itemsets in a Stream.

Approximate mining of global closed frequent itemsets over data streams

A Distributed Frequent Itemset Mining Algorithm Based on Spark

YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database.

A Novel Incremental Algorithm For Mining Frequent Itemsets

Gc-Tree: A Fast Online Algorithm For Mining Frequent Closed Itemsets

ASCF: Optimization of the Apriori Algorithm Using Spark-Based Cuckoo Filter Structure

Incremental frequent itemsets mining based on frequent pattern tree and multi-scale

A Decremental Algorithm of Frequent Itemset Maintenance for Mining Updated Databases

SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

A decremental algorithm for maintaining frequent itemsets in dynamic databases

An Efficient Incremental Algorithm for Frequent Itemsets Mining in Distorted Databases with Granular Computing

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Incremental frequent tree-structured pattern mining from semi-structured data

ANG: a Combination of Apriori and Graph Computing Techniques for Frequent Itemsets Mining

Mining Approximate Frequent Itemsets in the Presence of Noise: Algorithm and Analysis

Efficiently Mining Frequent Itemsets on Massive Data

A Novel Incremental Mining Algorithm of Frequent Patterns for Web Usage Mining

Mining Approximate Frequent Itemsets from Noisy Data