Adaptive Margin Global Classifier for Exemplar-Free Class-Incremental Learning

Zhongren Yao,Xiaobin Chang
2024-09-20
Abstract:Exemplar-free class-incremental learning (EFCIL) presents a significant challenge as the old class samples are absent for new task learning. Due to the severe imbalance between old and new class samples, the learned classifiers can be easily biased toward the new ones. Moreover, continually updating the feature extractor under EFCIL can compromise the discriminative power of old class features, e.g., leading to less compact and more overlapping distributions across classes. Existing methods mainly focus on handling biased classifier learning. In this work, both cases are considered using the proposed method. Specifically, we first introduce a Distribution-Based Global Classifier (DBGC) to avoid bias factors in existing methods, such as data imbalance and sampling. More importantly, the compromised distributions of old classes are simulated via a simple operation, variance enlarging (VE). Incorporating VE based on DBGC results in a novel classification loss for EFCIL. This loss is proven equivalent to an Adaptive Margin Softmax Cross Entropy (AMarX). The proposed method is thus called Adaptive Margin Global Classifier (AMGC). AMGC is simple yet effective. Extensive experiments show that AMGC achieves superior image classification results on its own under a challenging EFCIL setting. Detailed analysis is also provided for further demonstration.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve two main problems in Exemplar - Free Class - Incremental Learning (EFCIL): 1. **Classifier Bias**: In EFCIL, due to the lack of old - class samples, the learning of new tasks will cause the classifier to be biased towards new classes. Specifically, the severe imbalance between old - class and new - class samples makes the classifier easily biased towards new classes. 2. **Old - Class Feature Degradation**: Continuously updating the feature extractor will weaken the discriminative ability of old - class features, resulting in the old - class feature distribution becoming less compact and having increased overlap, as shown in Figure 1. ### Solutions To solve the above problems, the authors propose the Adaptive Margin Global Classifier (AMGC), and its main contributions include: - **Introducing the Distribution - Based Global Classifier (DBGC)**: By using the statistical information (mean vector \(\mu\) and covariance matrix \(\Sigma\)) of old and new classes, DBGC aims to alleviate the sampling bias and local optimum problems in existing methods. - **Simulating Old - Class Feature Degradation**: By introducing the Variance Enlarging (VE) technique, the degradation of old - class features is simulated. The specific operation is to increase the values on the diagonal of the old - class covariance matrix, that is: \[ \hat{\Sigma}_k=\Sigma_k + \lambda\Lambda_k \] where \(\Sigma_k\) is the covariance matrix of old - class \(k\), \(\Lambda_k\) is the diagonal matrix of \(\Sigma_k\), and \(\lambda> 0\) is a hyperparameter. - **Proposing the Adaptive Margin Softmax Cross - Entropy Loss (AMarX)**: Combining VE and DBGC, a new classification loss AMarX is derived. This loss can be regarded as a Softmax cross - entropy loss with an adaptive margin, which can adjust the margins of different classes, thereby better handling the problem of old - class feature degradation. The final model AMGC is composed of DBGC and AMarX. The experimental results show that AMGC has achieved state - of - the - art performance on multiple datasets. ### Formula Summary - **DBGC Loss**: \[ L_{DB}(\mu,\Sigma;\theta,\phi)=\frac{1}{K}\sum_{k = 1}^{K}\log\left(\sum_{j = 1}^{K}e^{\omega_j^T\mu_k+\frac{1}{2}\omega_j^T\Sigma_k\omega_j+\delta_j}\right) \] - **AMarX Loss**: \[ L_{o}^{AMarX}=-\frac{1}{O_t}\sum_{k = 1}^{O_t}\log\frac{e^{\omega_k^T\mu_k + b_k - m_k}}{e^{\omega_k^T\mu_k + b_k - m_k}+\sum_{j\neq k}e^{\omega_j^T\mu_k + b_j+\sigma_{j,k}+\beta_{j,k}}} \] where \(m_k=\frac{\lambda}{2}\omega_k^T\Lambda_k\omega_k\). These formulas ensure the effectiveness and robustness of the model when dealing with EFCIL.