Abstract:Traffic anomalies such as failures and attacks are increasing in frequency and severity, and thus identifying them rapidly and accurately is critical for large network operators. The detection typically treats the traffic as a collection of flows and looks for heavy changes in traffic patterns (e.g., volume, number of connections). However, as link speeds and the number of flows increase, keeping per-flow state is not scalable. The recently proposed sketch-based schemes [14] are among the very few that can detect heavy changes and anomalies over massive data streams at network traffic speeds. However, sketches do not preserve the key (e.g., source IP address) of the flows. Hence, even if anomalies are detected, it is difficult to infer the culprit flows, making it a big practical hurdle for online deployment. Meanwhile, the number of keys is too large to record. To address this challenge, we propose efficient reversible hashing algorithms to infer the keys of culprit flows from sketches without storing any explicit key information. No extra memory or memory accesses are needed for recording the streaming data. Meanwhile, the heavy change detection daemon runs in the background with space complexity and computational time sublinear to the key space size. This short paper describes the conceptual framework of the reversible sketches, as well as some initial approaches for implementation. See [23] for the optimized algorithms in details. comment We further apply various emph IP-mangling algorithms and emph bucket classification methods to reduce the false positives and false negatives. Evaluated with netflow traffic traces of a large edge router, we demonstrate that the reverse hashing can quickly infer the keys of culprit flows even for many changes with high accuracy.

Double-Anonymous Sketch: Achieving Top-K-fairness for Finding Global Top-K Frequent Items.

WavingSketch: An Unbiased and Generic Sketch for Finding Top-k Items in Data Streams

OneSketch: A Generic and Accurate Sketch for Data Streams

Bubble Sketch: A High-performance and Memory-efficient Sketch for Finding Top- K Items in Data Streams

Discussion On Fast And Accurate Sketches For Skewed Data Streams: A Case Study

HistSketch: A Compact Data Structure for Accurate Per-Key Distribution Monitoring.

SF-Sketch: A Two-Stage Sketch for Data Streams

OrderSketch: An Unbiased and Fast Sketch for Frequency Estimation of Data Streams

Detecting Top-k Flows Combining Probabilistic Sketch and Sliding Window

DISCO: A Dynamically Configurable Sketch Framework in Skewed Data Streams

Scalable Overspeed Item Detection in Streams

SSS: an Accurate and Fast Algorithm for Finding Top-k Hot Items in Data Streams

Stingy Sketch

PrivSketch: A Private Sketch-based Frequency Estimation Protocol for Data Streams

Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

An Accurate Estimation Algorithm for Big Data Streams.

Finding Simplex Items in Data Streams.

Finding Significant Items in Data Streams.

2FA Sketch: Two-Factor Armor Sketch for Accurate and Efficient Heavy Hitter Detection in Data Streams

Streaming Data Collection With a Private Sketch-Based Protocol

Cuckoo Counter: Adaptive Structure of Counters for Accurate Frequency and Top-k Estimation