Abstract:We propose an approach for anytime continual learning (AnytimeCL) for open vocabulary image classification. The AnytimeCL problem aims to break away from batch training and rigid models by requiring that a system can predict any set of labels at any time and efficiently update and improve when receiving one or more training samples at any time. Despite the challenging goal, we achieve substantial improvements over recent methods. We propose a dynamic weighting between predictions of a partially fine-tuned model and a fixed open vocabulary model that enables continual improvement when training samples are available for a subset of a task's labels. We also propose an attention-weighted PCA compression of training features that reduces storage and computation with little impact to model accuracy. Our methods are validated with experiments that test flexibility of learning and inference. Code is available at <a class="link-external link-https" href="https://github.com/jessemelpolio/AnytimeCL" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the continuous learning problem in open - vocabulary image classification, that is, how to efficiently update and improve the model when receiving new labeled data, and maintain the prediction ability for any label set. Specifically, the paper focuses on: 1. **Breaking the limitations of batch - training and fixed models**: The system is required to be able to predict any label set at any point in time, and be able to efficiently update and improve when receiving one or more training samples. 2. **Improving the performance of open - vocabulary classification**: Although existing open - vocabulary models (such as CLIP) can be trained on large - scale Internet data, their performance on many tasks is still not satisfactory. Therefore, the paper aims to continuously improve the performance of these models through continuous learning. 3. **Achieving "Anytime" Continual Learning (AnytimeCL)**: Ensure that the system can be quickly updated after receiving new samples at any time, and maintain the prediction ability for any label set throughout the process. To achieve the above goals, the authors propose the following methods: - **Dynamic weighted prediction**: Combine the prediction results of the partially fine - tuned model and the fixed open - vocabulary model, and achieve continuous improvement through dynamic weighting. - **Attention - weighted PCA compression**: Compress the training features to reduce storage and computational overhead while maintaining the accuracy of the model. - **Partial fine - tuning**: Only fine - tune the last transformer block of the model and keep the label embeddings unchanged, so as to retain general features while improving specific tasks. - **Loss function modification**: Introduce a new loss term, allowing the model to predict "none of the above" when there is no true label in the candidate label set, thereby improving the overall performance. Through these methods, the paper has verified its flexibility and effectiveness in multiple experiments, especially in data - incremental, class - incremental and task - incremental learning scenarios, and has achieved significant performance improvements.

Anytime Continual Learning for Open Vocabulary Classification

Continual Learning in Open-vocabulary Classification with Complementary Memory Systems

CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning

Simple Image-level Classification Improves Open-vocabulary Object Detection

Enhancing Visual Continual Learning with Language-Guided Supervision

Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images

Online continual learning in image classification: An empirical survey

Don't Stop Learning: Towards Continual Learning for the CLIP Model

Effective Continual Learning for Text Classification with Lightweight Snapshots.

The CLEAR Benchmark: Continual LEArning on Real-World Imagery

Open Vocabulary Multi-Label Video Classification

Open-Vocabulary Calibration for Fine-tuned CLIP

From Categories to Classifiers: Name-Only Continual Learning by Exploring the Web

Open-Vocabulary Object Detection using Pseudo Caption Labels

Effectiveness of Vision Language Models for Open-world Single Image Test Time Adaptation

Towards Open Vocabulary Learning: A Survey

Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning

Delving into the Openness of CLIP

SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model

A Comprehensive Empirical Evaluation on Online Continual Learning