Abstract:We present novel reductions from sample compression schemes in multiclass classification, regression, and adversarially robust learning settings to binary sample compression schemes. Assuming we have a compression scheme for binary classes of size $f(d_\mathrm{VC})$, where $d_\mathrm{VC}$ is the VC dimension, then we have the following results: (1) If the binary compression scheme is a majority-vote or a stable compression scheme, then there exists a multiclass compression scheme of size $O(f(d_\mathrm{G}))$, where $d_\mathrm{G}$ is the graph dimension. Moreover, for general binary compression schemes, we obtain a compression of size $O(f(d_\mathrm{G})\log|Y|)$, where $Y$ is the label space. (2) If the binary compression scheme is a majority-vote or a stable compression scheme, then there exists an $\epsilon$-approximate compression scheme for regression over $[0,1]$-valued functions of size $O(f(d_\mathrm{P}))$, where $d_\mathrm{P}$ is the pseudo-dimension. For general binary compression schemes, we obtain a compression of size $O(f(d_\mathrm{P})\log(1/\epsilon))$. These results would have significant implications if the sample compression conjecture, which posits that any binary concept class with a finite VC dimension admits a binary compression scheme of size $O(d_\mathrm{VC})$, is resolved (Littlestone and Warmuth, 1986; Floyd and Warmuth, 1995; Warmuth, 2003). Our results would then extend the proof of the conjecture immediately to other settings. We establish similar results for adversarially robust learning and also provide an example of a concept class that is robustly learnable but has no bounded-size compression scheme, demonstrating that learnability is not equivalent to having a compression scheme independent of the sample size, unlike in binary classification, where compression of size $2^{O(d_\mathrm{VC})}$ is attainable (Moran and Yehudayoff, 2016).

Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses

Sample Compression Hypernetworks: From Generalization Bounds to Meta-Learning

Compression, Generalization and Learning

Agnostic Sample Compression Schemes for Regression

Data-dependent Generalization Bounds via Variable-Size Compressibility

Sample Compression Scheme Reductions

Sample Compression, Support Vectors, and Generalization in Deep Learning

Loss Gradient Gaussian Width based Generalization and Optimization Guarantees

De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors

Do Compressed Representations Generalize Better?

Sample Complexity Bounds for 1-bit Compressive Sensing and Binary Stable Embeddings with Generative Priors

A New Family of Generalization Bounds Using Samplewise Evaluated CMI

Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network

On statistical learning via the lens of compression

Learning Non-Vacuous Generalization Bounds from Optimization

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Understanding Generalization in Deep Learning via Tensor Methods

Finite Sample Analysis and Bounds of Generalization Error of Gradient Descent in In-Context Linear Regression

Information-theoretic generalization bounds for black-box learning algorithms

Rethinking Information-theoretic Generalization: Loss Entropy Induced PAC Bounds

On Compression Principle and Bayesian Optimization for Neural Networks