Abstract:Cardinality sketches are compact data structures for representing sets or vectors, enabling efficient approximation of their cardinality (or the number of nonzero entries). These sketches are space-efficient, typically requiring only logarithmic storage relative to input size, and support incremental updates, allowing for dynamic modifications. A critical property of many cardinality sketches is composability, meaning that the sketch of a union of sets can be computed from individual sketches. Existing designs typically provide strong statistical guarantees, accurately answering an exponential number of queries in terms of sketch size $k$. However, these guarantees degrade to quadratic in $k$ when queries are adaptive and may depend on previous responses. Prior works on statistical queries (Steinke and Ullman, 2015) and specific MinHash cardinality sketches (Ahmadian and Cohen, 2024) established that the quadratic bound on the number of adaptive queries is, in fact, unavoidable. In this work, we develop a unified framework that generalizes these results across broad classes of cardinality sketches. We show that any union-composable sketching map is vulnerable to attack with $\tilde{O}(k^4)$ queries and, if the sketching map is also monotone (as for MinHash and statistical queries), we obtain a tight bound of $\tilde{O}(k^2)$ queries. Additionally, we demonstrate that linear sketches over the reals $\mathbb{R}$ and fields $\mathbb{F}_p$ can be attacked using $\tilde{O}(k^2)$ adaptive queries, which is optimal and strengthens some of the recent results by Gribelyuk et al. (2024), which required a larger polynomial number of rounds for such matrices.

MTS Sketch for Accurate Estimation of Set-Expression Cardinalities from Small Samples

Sampling Space-Saving Set Sketches

QSketch: An Efficient Sketch for Weighted Cardinality Estimation in Streams

Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join Queries

An Accurate Estimation Algorithm for Big Data Streams.

Generalized Sketches for Streaming Sets

OneSketch: A Generic and Accurate Sketch for Data Streams

Cardinality Estimation Meets Good-Turing

Graphical Model Sketch.

Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming

Discussion On Fast And Accurate Sketches For Skewed Data Streams: A Case Study

SimiSketch: Efficiently Estimating Similarity of streaming Multisets

Simple and Efficient Cardinality Estimation in Data Streams

One Attack to Rule Them All: Tight Quadratic Bounds for Adaptive Queries on Cardinality Sketches

gSketch: On Query Estimation in Graph Streams

Hyper-USS: Answering Subset Query over Multi-Attribute Data Stream.

Leveraging Discarded Samples for Tighter Estimation of Multiple-Set Aggregates

OmniSketch: Efficient Multi-Dimensional High-Velocity Stream Analytics with Arbitrary Predicates

Estimating Cardinalities with Deep Sketches

Approaching 100% Confidence in Stream Summary through ReliableSketch

SQUAD: Combining Sketching and Sampling Is Better than Either for Per-item Quantile Estimation