On Representation-Level Forgetting in Class Incremental Learning: What's the Bottleneck?
zixuan ni,Haizhou Shi,Longhui Wei,Yueting Zhuang,Qi Tian,Siliang Tang
DOI: https://doi.org/10.2139/ssrn.4383369
2023-01-01
Abstract:Although the concept of Catastrophic Forgetting (CF) is straightforward in Class Incremental Learning (CIL), the causes of CF in models are still vague. In this paper, by introducing the metrics of representation quality, we investigate and analyze systematically, yet quantitatively, the three primary causes for the catastrophic forgetting of model in CIL, namely Intra-phase Forgetting (IpF), Inter-phase Confusion (IpC) and Classifier Deviation (CD). The (i) IpF happens when the learner fails to correctly align the same-phase data as training proceeds, (ii) IpC happens when the learner confuses the current-phase data with the previous, and (iii) CD happens when the old classifier deviates from the current representation space. By studying extensively, we have discovered that the current distillation process can effectively address the IpF issue, and NN-based techniques have brought some alleviation to CD. Nevertheless, the IpC dilemma still awaits further exploration. To spark a more in-depth inquiry into this highly relevant question, we propose a simple yet effective framework, \textbf{C}ontrastive \textbf{C}lass \textbf{C}oncentration for \textbf{CIL}~(C4IL). It utilizes the benefit of distillation and contrastive learning, producing a representation distribution that is more cohesive within the same class and clearly different across different classes, in order to reduce representation overlapping throughout the various training phases and address the issue of IpC. Quantitative experiments showcase the effectiveness of our framework: it outperforms the baseline methods iCaRL by 10\% in top-1 accuracy at the final training phase, and the average accuracy is competitive with the up-to-date method RMM. Qualitative results also demonstrate that C4IL significantly lowers the probability of inter-phase confusion and generates a more compact representation distribution which alleviates the IpC problem.