Abstract:We control the probability of the uniform deviation between empirical and generalization performances of multi-category classifiers by an empirical L1 -norm covering number when these performances are defined on the basis of the truncated hinge loss function. The only assumption made on the functions implemented by multi-category classifiers is that they are of bounded variation (BV). For such classifiers, we derive the sample size estimate sufficient for the mentioned performances to be close with high probability. Particularly, we are interested in the dependency of this estimate on the number C of classes. To this end, first, we upper bound the scale-sensitive version of the VC-dimension, the fat-shattering dimension of sets of BV functions defined on R^d which gives a O(1/epsilon^d ) as the scale epsilon goes to zero. Secondly, we provide a sharper decomposition result for the fat-shattering dimension in terms of C, which for sets of BV functions gives an improvement from O(C^(d/2 +1)) to O(Cln^2(C)). This improvement then propagates to the sample complexity estimate.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is about the estimation of sample complexity for multi - classifiers. Specifically, the author focuses on how to control the probability of consistent deviation between the empirical performance and the generalization performance of multi - classifiers under the truncated hinge loss function through the empirical L1 - norm covering number. The main assumption of the paper is that the functions implemented by these classifiers have Bounded Variation (BV). Based on this assumption, the author derives an estimate of the sample size to ensure a high probability of closeness between these performances. In particular, the author conducts a detailed study on how this estimate depends on the number of classes \(C\). ### Main Contributions 1. **Sample Complexity Estimation**: The author controls the probability of consistent deviation between the empirical performance and the generalization performance of multi - classifiers through the empirical L1 - norm covering number and gives an estimate of the sample size. 2. **Dependence on the Number of Classes \(C\)**: The author conducts a detailed study on how the sample complexity estimate depends on the number of classes \(C\) and provides a more efficient decomposition result, improving the dependence from \(O(C^d + 1)\) to \(O(C\ln^2(C))\). ### Technical Details - **Bounded Variation Function**: The paper assumes that the functions implemented by the classifiers have Bounded Variation (BV), which is an important application of Helly's selection theorem, enabling the BV space to be compactly embedded into the L1 space. - **Empirical L1 - norm Covering Number**: The author uses the empirical L1 - norm covering number to control the performance deviation of multi - classifiers, and this method is particularly effective in dealing with multi - class problems. - **Decomposition Result**: The author provides a new decomposition result for the fat - shattering dimension, which improves the dependence from \(O(C^{d + 1})\) to \(O(C\ln^2(C))\) in the multi - class setting. ### Formula Summary - **Upper Bound of Fat - Shattering Dimension**: \[ d_F(\epsilon)\leq\left(1 + A\sqrt{\frac{VK}{\epsilon}}\right)^d \] - **Sample Complexity Estimation**: \[ \ln N(\epsilon, F, d_1, t_n)\leq KM\left(\sqrt{\frac{dAVK_P}{\epsilon}}\right)^d\left(\frac{2}{\epsilon}\right)^d \] - **Improved Fat - Shattering Dimension Decomposition**: \[ d_{F_G,\gamma}(\epsilon)\leq32Cd_{G_0}\left(\frac{\epsilon}{4}\right)\log_2\left(\frac{256CM^2}{\epsilon^2d_{G_0}\left(\frac{\epsilon}{4}\right)}\right) \] Through these technical means, the paper effectively solves the problem of estimating the sample complexity of multi - classifiers and provides a more refined dependence analysis.

Sample Complexity Result for Multi-category Classifiers of Bounded Variation

A Multi-Class Large Margin Classifier

The Sample Complexity of Multi-Distribution Learning for VC Classes

The sample complexity of multi-distribution learning

Distribution-Dependent Sample Complexity of Large Margin Learning

Sample Complexity of Probability Divergences under Group Symmetry

Sample Complexity Bounds for Estimating Probability Divergences under Invariances

Prediction, Learning, Uniform Convergence, and Scale-sensitive Dimensions

Sample Compression Scheme Reductions

Asymptotic behavior of some multicategory classification methods for high-dimensional data

A New Family of Generalization Bounds Using Samplewise Evaluated CMI

Sample Complexity of Nonparametric Semi-Supervised Learning

Optimal Sample Complexity of Contrastive Learning

On the Sample Complexity of Predictive Sparse Coding

Error Bounds for Real Function Classes Based on Discretized Vapnik-Chervonenkis Dimensions.

The Sample Complexity of Dictionary Learning

Fast rates for empirical risk minimization over càdlàg functions with bounded sectional variation norm

The Sample Complexity Of ERMs In Stochastic Convex Optimization

Easy Variational Inference for Categorical Models via an Independent Binary Approximation

Generalization bounds for regression and classification on adaptive covering input domains

Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser