Abstract:Frequent item mining, which deals with finding items that occur frequently in a given data stream over a period of time, is one of the heavily studied problems in data stream mining. A generalized version of frequent item mining is the persistent item mining, where a persistent item, unlike a frequent item, does not necessarily occur more frequently compared to other items over a short period of time, rather persists and occurs more frequently over a long period of time. To the best of our knowledge, there is no prior work on mining persistent items in a data stream. In this paper, we address the fundamental problem of finding persistent items in a given data stream during a given period of time at any given observation point. We propose a novel scheme, PIE, that can accurately identify each persistent item with a probability greater than any desired false negative rate (FNR) while using a very small amount of memory. The key idea of PIE is that it uses Raptor codes to encode the ID of each item that appears at the observation point during a measurement period and stores only a few bits of the encoded ID in the memory of that observation point during that measurement period. The item that is persistent occurs in enough measurement periods that enough encoded bits for the ID can be retrieved from the observation point to decode them correctly and get the ID of the persistent item. We implemented and extensively evaluated PIE using three real network traffic traces and compared its performance with two prior adapted schemes. Our results show that not only PIE achieves the desired FNR in every scenario, its FNR, on average, is 19.5 times smaller than the FNR of the best adapted prior art.

Finding Significant Items in Data Streams.

LTC: A Fast Algorithm to Accurately Find Significant Items in Data Streams

A New Algorithm for Mining Global Frequent Itemsets in a Stream.

Approximate mining of global closed frequent itemsets over data streams

SSS: an Accurate and Fast Algorithm for Finding Top-k Hot Items in Data Streams

Frequent Items Mining Based on Weight in Data Stream

Gc-Tree: A Fast Online Algorithm For Mining Frequent Closed Itemsets

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database.

Local Differentially Private Heavy Hitter Detection in Data Streams with Bounded Memory

WavingSketch: An Unbiased and Generic Sketch for Finding Top-k Items in Data Streams

Finding Persistent Items in Data Streams

Finding frequent items in data streams using hierarchical information.

Finding Frequent Items in Time Decayed Data Streams.

Mining Top-k Minimal Redundancy Frequent Patterns over Uncertain Databases.

Finding needles in a hay stream: On persistent item lookup in data streams

Novel structures for counting frequent items in time decayed streams

Finding the Hottest Item in Data Streams

DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams

Fine-grained Probability Counting for Cardinality Estimation of Data Streams.

An Algorithm of Mining Frequent Itemsets Based on Bloom Filter

Bubble Sketch: A High-performance and Memory-efficient Sketch for Finding Top- K Items in Data Streams