Abstract:Although neural networks have been used extensively in pattern recognition scenarios, the pre-acquisition of datasets is still challenging. In most pattern recognition areas, preparing a training dataset that covers all data domains is difficult. Incremental learning was proposed to update neural networks in an online manner, but the catastrophic forgetting issue still needs to be studied. Class-incremental learning is one of the most challenging incremental learning contexts; it trains a unified model to classify all incrementally arriving classes learned thus far equally. Prior studies on class-incremental learning favor model stability over plasticity to realize old knowledge reservation and prevent catastrophic forgetting. Consequently, the model's plasticity is omitted, leading to difficult generalization on new data. We propose a novel distillation-based method named Hyper-feature Aggregation and Relaxed Distillation (HARD) to realize balanced optimization of old and new knowledge. The aggregation of features is proposed to capture the global semantics while maintaining the diversity of the feature distribution after promoting representations of exemplars to higher dimensions. The proposed algorithm also introduces a relaxed restriction in the hyper-feature space to conditions the hyper-feature space through a normalized comparison of the relation matrices. Following generalization on more classes, the model is encouraged to rebuild the feature distribution when meeting new classes and to fine-tune the feature space to realize more distinct interclass boundaries. Extensive experiments were conducted on two benchmark datasets, and consistent improvements under diverse experimental settings demonstrated the effectiveness of the proposed approach.

Step Out and Seek Around: On Warm-Start Training with Incremental Data

Incremental Scene Classification Using Dual Knowledge Distillation and Classifier Discrepancy on Natural and Remote Sensing Images

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling.

An Embarrassingly Simple Approach for Knowledge Distillation

Model Behavior Preserving for Class-Incremental Learning

Adaptively Integrated Knowledge Distillation and Prediction Uncertainty for Continual Learning

Stage-by-stage Knowledge Distillation

Why Warmup the Learning Rate? Underlying Mechanisms and Improvements

DKT: Diverse Knowledge Transfer Transformer for Class Incremental Learning

A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation

Knowledge Distillation for Efficient Sequences of Training Runs

More Task-Balanced Class-Incremental Learning

Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation

Densely Distilling Cumulative Knowledge for Continual Learning

Double Confidence Calibration Focused Distillation for Task-Incremental Learning

Hyper-feature aggregation and relaxed distillation for class incremental learning

Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class Incremental Learning

Maintaining Discrimination and Fairness in Class Incremental Learning

Adaptive knowledge transfer for class incremental learning

Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability

Rectification-based Knowledge Retention for Continual Learning