Sample Complexity Result for Multi-category Classifiers of Bounded Variation

Khadija Musayeva
DOI: https://doi.org/10.48550/arXiv.2003.09176
2020-05-24
Abstract:We control the probability of the uniform deviation between empirical and generalization performances of multi-category classifiers by an empirical L1 -norm covering number when these performances are defined on the basis of the truncated hinge loss function. The only assumption made on the functions implemented by multi-category classifiers is that they are of bounded variation (BV). For such classifiers, we derive the sample size estimate sufficient for the mentioned performances to be close with high probability. Particularly, we are interested in the dependency of this estimate on the number C of classes. To this end, first, we upper bound the scale-sensitive version of the VC-dimension, the fat-shattering dimension of sets of BV functions defined on R^d which gives a O(1/epsilon^d ) as the scale epsilon goes to zero. Secondly, we provide a sharper decomposition result for the fat-shattering dimension in terms of C, which for sets of BV functions gives an improvement from O(C^(d/2 +1)) to O(Cln^2(C)). This improvement then propagates to the sample complexity estimate.
Machine Learning,Functional Analysis
What problem does this paper attempt to address?
The problem that this paper attempts to solve is about the estimation of sample complexity for multi - classifiers. Specifically, the author focuses on how to control the probability of consistent deviation between the empirical performance and the generalization performance of multi - classifiers under the truncated hinge loss function through the empirical L1 - norm covering number. The main assumption of the paper is that the functions implemented by these classifiers have Bounded Variation (BV). Based on this assumption, the author derives an estimate of the sample size to ensure a high probability of closeness between these performances. In particular, the author conducts a detailed study on how this estimate depends on the number of classes \(C\). ### Main Contributions 1. **Sample Complexity Estimation**: The author controls the probability of consistent deviation between the empirical performance and the generalization performance of multi - classifiers through the empirical L1 - norm covering number and gives an estimate of the sample size. 2. **Dependence on the Number of Classes \(C\)**: The author conducts a detailed study on how the sample complexity estimate depends on the number of classes \(C\) and provides a more efficient decomposition result, improving the dependence from \(O(C^d + 1)\) to \(O(C\ln^2(C))\). ### Technical Details - **Bounded Variation Function**: The paper assumes that the functions implemented by the classifiers have Bounded Variation (BV), which is an important application of Helly's selection theorem, enabling the BV space to be compactly embedded into the L1 space. - **Empirical L1 - norm Covering Number**: The author uses the empirical L1 - norm covering number to control the performance deviation of multi - classifiers, and this method is particularly effective in dealing with multi - class problems. - **Decomposition Result**: The author provides a new decomposition result for the fat - shattering dimension, which improves the dependence from \(O(C^{d + 1})\) to \(O(C\ln^2(C))\) in the multi - class setting. ### Formula Summary - **Upper Bound of Fat - Shattering Dimension**: \[ d_F(\epsilon)\leq\left(1 + A\sqrt{\frac{VK}{\epsilon}}\right)^d \] - **Sample Complexity Estimation**: \[ \ln N(\epsilon, F, d_1, t_n)\leq KM\left(\sqrt{\frac{dAVK_P}{\epsilon}}\right)^d\left(\frac{2}{\epsilon}\right)^d \] - **Improved Fat - Shattering Dimension Decomposition**: \[ d_{F_G,\gamma}(\epsilon)\leq32Cd_{G_0}\left(\frac{\epsilon}{4}\right)\log_2\left(\frac{256CM^2}{\epsilon^2d_{G_0}\left(\frac{\epsilon}{4}\right)}\right) \] Through these technical means, the paper effectively solves the problem of estimating the sample complexity of multi - classifiers and provides a more refined dependence analysis.