Abstract:We establish a tight characterization of the worst-case rates for the excess risk of agnostic learning with sample compression schemes and for uniform convergence for agnostic sample compression schemes. In particular, we find that the optimal rates of convergence for size-$k$ agnostic sample compression schemes are of the form $\sqrt{\frac{k \log(n/k)}{n}}$, which contrasts with agnostic learning with classes of VC dimension $k$, where the optimal rates are of the form $\sqrt{\frac{k}{n}}$.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
The main objective of this paper is to provide a strict lower - bound description for sample compression schemes in agnostic learning. Specifically, the author attempts to answer the following two key questions:
1. **Optimal convergence rate of sample compression schemes in agnostic learning**:
- For an agnostic sample compression scheme of size \( k \), can the upper bound of the expected generalization error be further optimized? In particular, can the logarithmic factor \( \log(n/k) \) be removed?
- Existing research shows that in the realizable case, this logarithmic factor can be removed, but for the agnostic case, this question has not been clearly answered yet.
2. **Lower bound of uniform convergence**:
- For an agnostic sample compression scheme, what is the worst - case convergence rate of its uniform convergence? Does there exist a tight lower bound to describe this convergence rate?
#### Main contributions
The author shows through constructive proof that for agnostic sample compression schemes, the logarithmic factor \( \log(n/k) \) is necessary. The specific results are as follows:
- **Theorem 1**: For any \( n, k\in\mathbb{N} \) and \( |X|\geq n\geq ck \), the lower bound of the expected excess risk in agnostic learning is:
\[
E_{\text{ag}}(n,k)\gtrsim\sqrt{\frac{k\log(n/k)}{n}}
\]
- **Theorem 2**: Combining the above relationship, there is a similar result for the lower bound of uniform convergence:
\[
E_{\text{uc}}(n,k)\gtrsim\sqrt{\frac{k\log(n/k)}{n}}
\]
These results, combined with the known upper bounds, provide a tight characterization of the convergence rate of agnostic sample compression schemes in the worst - case.
In addition, the author also studies sample compression schemes that are sequence - dependent and proves a similar conclusion, that is, the logarithmic factor \( \log(n) \) is also necessary in this case.
#### Conclusion
Overall, this paper solves an important theoretical problem in sample compression schemes in agnostic learning, namely whether the logarithmic factor \( \log(n/k) \) can be removed. Through strict mathematical construction and proof, the author shows that this factor is essential, thus providing an important theoretical basis for research in this field.