Abstract:Continual learning (CL) aims to learn new tasks without forgetting previous tasks. However, existing CL methods require a large amount of raw data, which is often unavailable due to copyright considerations and privacy risks. Instead, stakeholders usually release pre-trained machine learning models as a service (MLaaS), which users can access via APIs. This paper considers two practical-yet-novel CL settings: data-efficient CL (DECL-APIs) and data-free CL (DFCL-APIs), which achieve CL from a stream of APIs with partial or no raw data. Performing CL under these two new settings faces several challenges: unavailable full raw data, unknown model parameters, heterogeneous models of arbitrary architecture and scale, and catastrophic forgetting of previous APIs. To overcome these issues, we propose a novel data-free cooperative continual distillation learning framework that distills knowledge from a stream of APIs into a CL model by generating pseudo data, just by querying APIs. Specifically, our framework includes two cooperative generators and one CL model, forming their training as an adversarial game. We first use the CL model and the current API as fixed discriminators to train generators via a derivative-free method. Generators adversarially generate hard and diverse synthetic data to maximize the response gap between the CL model and the API. Next, we train the CL model by minimizing the gap between the responses of the CL model and the black-box API on synthetic data, to transfer the API's knowledge to the CL model. Furthermore, we propose a new regularization term based on network similarity to prevent catastrophic forgetting of previous APIs.Our method performs comparably to classic CL with full raw data on the MNIST and SVHN in the DFCL-APIs setting. In the DECL-APIs setting, our method achieves 0.97x, 0.75x and 0.69x performance of classic CL on CIFAR10, CIFAR100, and MiniImageNet.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of continual learning (CL) without complete original data. Specifically, the paper proposes two new and practical CL settings: data - efficient CL (DECL - APIs) and data - free CL (DFCL - APIs). These settings can learn from API streams without a large amount of original data or without any original data at all. #### Background and challenges 1. **Data access limitations**: - Many valuable rare datasets (such as company sales data or hospital medical diagnosis data) are inaccessible due to privacy issues. - Pretrained models are usually released as a service (MLaaS), and users can only access these models through APIs, without being able to obtain the internal parameters and architectures of the models. 2. **Limitations of existing methods**: - Existing CL methods require a large amount of original data, which is infeasible in many practical scenarios. - These methods cannot handle the problem of learning from black - box APIs because they assume that the internal structure and parameters of the model can be accessed. #### Proposed new settings 1. **Data - free CL (DFCL - APIs)**: - In this setting, we have no access to the original training data Dk of task k at all, and can only obtain responses through the black - box API fk_b. - The goal is to make the CL model fcl match the output of the black - box API fk_b on the generated pseudo - dataset ˆDk_G. 2. **Data - efficient CL (DECL - APIs)**: - In this setting, we can access a small amount of original data. - The goal is to use this small amount of original data to enhance the learning ability of the CL model while avoiding catastrophic forgetting. #### Solutions To overcome these challenges, the paper proposes a new data - free cooperative continual distillation learning framework. The main features of this framework are as follows: 1. **Adversarial training of generators and CL models**: - Use two cooperative generators (GA and GB) to generate "difficult", "diverse" and "class - balanced" pseudo - samples. - Through adversarial training, the pseudo - samples generated by the generators can maximize the output gap between the CL model and the API, while the CL model learns the knowledge of the API by minimizing these gaps. 2. **Zero - order gradient estimation**: - Since the API is a black - box and the gradient cannot be directly calculated, a zero - order gradient estimation method is used to approximate the gradient. - Estimate the gradient of the generator by the forward differences method and update the parameters of the generator. 3. **Preventing catastrophic forgetting**: - Introduce a new regularization term based on network similarity measurement to prevent the CL model from forgetting previously learned APIs. - Further alleviate catastrophic forgetting by replaying a small amount of old data or generated pseudo - data. #### Experimental verification The paper conducted extensive experiments in multiple typical scenarios, including computer vision and natural language processing tasks. The experimental results show that in the DFCL - APIs setting, the performance of the proposed framework on the MNIST and SVHN datasets is comparable to that of classical CL, even without original data. On the more challenging CIFAR10, CIFAR100 and MiniImageNet datasets, the performance of the proposed framework in theDECL - APIs setting is 0.97 times, 0.75 times and 0.69 times that of classical CL, respectively. ### Summary This paper significantly expands the scope of CL in practical applications by proposing new DFCL - APIs and DECL - APIs settings and the corresponding data - free cooperative continual distillation learning framework. This framework can not only perform effective CL without original data, but also effectively prevent catastrophic forgetting, thus achieving good performance on a variety of tasks and datasets.

Continual Learning From a Stream of APIs

Progressive Learning without Forgetting

TARGET: Federated Class-Continual Learning Via Exemplar-Free Distillation

Dynamic Consolidation for Continual Learning

Heterogeneous Continual Learning

Improving Plasticity in Online Continual Learning via Collaborative Learning

Density Distribution-based Learning Framework for Addressing Online Continual Learning Challenges

Continual Learning with Pre-Trained Models: A Survey

On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code

Non-Exemplar Online Class-incremental Continual Learning via Dual-prototype Self-augment and Refinement

The CLEAR Benchmark: Continual LEArning on Real-World Imagery

Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation

Prior-Free Continual Learning with Unlabeled Data in the Wild

Federated Continuous Learning With Broad Network Architecture

AdaptCL: Adaptive Continual Learning for Tackling Heterogeneity in Sequential Datasets

Bio-inspired, task-free continual learning through activity regularization

RanPAC: Random Projections and Pre-trained Models for Continual Learning

Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online Continual Learning

CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One

Bi-Objective Continual Learning: Learning 'new' While Consolidating 'Known'

Learning an evolved mixture model for task-free continual learning