Abstract:Frequent itemset mining is a popular and important first step in the analysis of data arising in a broad range of applications. The traditional “exact” model for frequent itemsets requires that every item occur in each supporting transaction. However, real data is typically subject to noise and measurement error. To date, the effect of noise on exact frequent pattern mining algorithms have been addressed primarily through simulation studies, and there has been limited attention to the development of noise tolerant algorithms. In this paper we propose a noise tolerant itemset model, which we call approximate frequent itemsets (AFI). Like frequent itemsets, the AFI model requires that an itemset has a minimum number of supporting transactions. However, the AFI model tolerates a controlled fraction of errors in each item and each supporting transaction. Motivating this model are theoretical results (and a supporting simulation study presented here) which state that, in the presence of even low levels of noise, large frequent itemsets are broken into fragments of logarithmic size; thus the itemsets cannot be recovered by a routine application of frequent itemset mining. By contrast, we provide theoretical results showing that the AFI criterion is well suited to recovery of block structures subject to noise. We developed and implemented an algorithm to mine AFIs that generalizes the level-wise enumeration of frequent itemsets by allowing noise. We propose the noise-tolerant support threshold, a relaxed version of support, which varies with the length of the itemset and the noise threshold. We exhibit an Apriori property that permits the pruning of an itemset if any of its sub-itemset is not sufficiently supported. Several experiments presented demonstrate that the AFI algorithm enables better recoverability of frequent patterns under noisy conditions than existing frequent itemset mining approaches. Noise-tolerant support pruning also renders an order of magnitude performance gain over existing methods.

Summary queries for frequent itemsets mining

A New Algorithm for Mining Global Frequent Itemsets in a Stream.

Mining Associated and Item-Item Correlated Frequent Patterns

Approximate mining of global closed frequent itemsets over data streams

Using Quantitative Association Rules in Collaborative Filtering

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database.

Knowledge and Information Systems RESEARCH A RTICLE

Mining summarization of high utility itemsets

Mining Maximum Length Frequent Itemsets: A Summary of Results

On Efficiently Summarizing Categorical Databases

SUMMARY: Efficiently Summarizing Transactions for Clustering

An efficient approach for interactive mining of frequent itemsets

FRI-Miner: Fuzzy Rare Itemset Mining

Non-Almost-Derivable Frequent Itemsets Mining

Mining Approximate Frequent Itemsets from Noisy Data

Mining Approximate Frequent Itemsets in the Presence of Noise: Algorithm and Analysis

Discovery of Maximal Frequent Item Sets using Subset Creation

A Concise Representation of Generalized Frequent Itemsets Based on Profile Summary

FHUQI-Miner: Fast high utility quantitative itemset mining

Mining High Occupancy Itemsets.

IWFPM: Interested Weighted Frequent Pattern Mining with Multiple Supports