Abstract:In this paper, we introduce a novel discriminative loss function with large margin in the context of Deep Learning. This loss boosts the discriminative power of neural nets, represented by intra-class compactness and inter-class separability. On the one hand, the class compactness is ensured by close distance of samples of the same class to each other. On the other hand, the inter-class separability is boosted by a margin loss that ensures the minimum distance of each class to its closest boundary. All the terms in our loss have an explicit meaning, giving a direct view of the feature space obtained. We analyze mathematically the relation between compactness and margin term, giving a guideline about the impact of the hyper-parameters on the learned features. Moreover, we also analyze properties of the gradient of the loss with respect to the parameters of the neural net. Based on this, we design a strategy called partial momentum updating that enjoys simultaneously stability and consistency in training. Furthermore, we also investigate generalization errors to have better theoretical insights. Our loss function systematically boosts the test accuracy of models compared to the standard softmax loss in our experiments.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the deficiency of the traditional softmax loss in deep - learning classification tasks. Specifically, although the traditional softmax loss can help the network generate separable features, these features may not be discriminative enough, that is, the intra - class differences are large while the inter - class differences are small. This makes the model perform poorly when dealing with problems with large intra - class variations and high inter - class similarities. To improve the generalization ability of the model, the author proposes a new loss function, aiming to enhance both intra - class compactness and inter - class separability simultaneously. The following are the specific problems and solutions mentioned in the paper: ### 1. **Intra - class Compactness** - **Problem**: The traditional softmax loss cannot effectively ensure that samples of the same class are clustered together in the feature space, resulting in an overly dispersed distribution of intra - class samples. - **Solution**: The hinged center loss is introduced to ensure that samples of each class are as close as possible to their class centers, but not completely collapsed into a point. By setting a predefined distance threshold \(\delta_v\), the feature - collapse phenomenon is avoided. ### 2. **Inter - class Separability** - **Problem**: The traditional softmax loss does not explicitly consider the boundaries between classes, resulting in samples of different classes may be too close, affecting the classification performance. - **Solution**: The margin loss is introduced. By maximizing the distance between each class center and the decision boundary, it is ensured that there is sufficient separation between classes. Specifically, for each class \(c\), the minimum distance to the decision boundaries of other classes is calculated and optimized to be greater than the predefined threshold \(\delta_d\). ### 3. **Comprehensive Objective** - **Problem**: A method that can simultaneously improve intra - class compactness and inter - class separability is required. - **Solution**: A comprehensive loss function that combines the hinged center loss and the margin loss is proposed, and a regularization term is added to prevent over - fitting. The form of this loss function is: \[ L=\alpha\cdot L_{\text{compact}}+\beta\cdot L_{\text{margin}}+\gamma\cdot L_{\text{reg}} \] where \(L_{\text{compact}}\) is used to enhance intra - class compactness, \(L_{\text{margin}}\) is used to enhance inter - class separability, and \(L_{\text{reg}}\) is used to perform regularization. ### 4. **Theoretical Analysis and Experimental Verification** - **Theoretical Analysis**: The author analyzes the relationship between intra - class compactness and inter - class margins through mathematical derivations and gives guidelines for hyper - parameter selection. - **Experimental Verification**: The experimental results on standard datasets such as CIFAR10 and SVHN show that the newly proposed loss function can significantly improve the test accuracy of the model, outperforming the traditional softmax loss. ### Summary The main contribution of this paper is to propose a new loss function that can simultaneously enhance intra - class compactness and inter - class separability in deep - learning classification tasks, thereby improving the discriminative ability and generalization performance of the model. Through theoretical analysis and experimental verification, the effectiveness of this method is proved.

Large Margin Discriminative Loss for Classification

Learning Towards The Largest Margins.

InterFace:Adjustable Angular Margin Inter-class Loss for Deep Face Recognition

Margin Loss: Making Faces More Separable

A Large Margin Classifier with Additional Features

Enlarged Large Margin Loss for Imbalanced Classification

Large Margin Deep Neural Networks: Theory and Algorithms.

Large margin deep neural networks: Theory and algorithms

Learnable dynamic margin in deep metric learning

Large margin nearest local mean classifier

Recent Advances in Large Margin Learning

IntraLoss: Further Margin via Gradient-Enhancing Term for Deep Face Recognition

Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin

Improving The Local Stability of Deep Model With Margin Losses.

Learning Deep Embeddings Via Margin-Based Discriminate Loss.

Large Margin Few-Shot Learning

Rethinking Feature Distribution for Loss Functions in Image Classification

Max-Margin-Based Discriminative Feature Learning

Distribution of Classification Margins: Are All Data Equal?

Negative Margin Matters: Understanding Margin in Few-Shot Classification

Unified Binary and Multiclass Margin-Based Classification