Abstract:Discovering novel visual categories from a set of unlabeled images is a crucial and essential capability for intelligent vision systems since it enables them to automatically learn new concepts with no need for human-annotated supervision anymore. To tackle this problem, existing approaches first pretrain a neural network with a set of labeled images and then fine-tune the network to cluster unlabeled images into a few categorical groups. However, their unified feature representation hits a tradeoff bottleneck between feature preservation on labeled data and feature adaptation on unlabeled data. To circumvent this bottleneck, we propose a residual-tuning approach, which estimates a new residual feature from the pretrained network and adds it with a previous basic feature to compute the clustering objective together. Our disentangled representation approach facilitates adjusting visual representations for unlabeled images and overcoming forgetting old knowledge acquired from labeled images, with no need of replaying the labeled images again. In addition, residual-tuning is an efficient solution, adding few parameters and consuming modest training time. Our results on three common benchmarks show consistent and considerable gains over other state-of-the-art methods, and further reduce the performance gap to the fully supervised learning setup. Moreover, we explore two extended scenarios, including using fewer labeled classes and continually discovering more unlabeled sets, where the results further signify the advantages and effectiveness of our residual-tuning approach against previous approaches. Our code is available at https://github.com/liuyudut/ResTune.

Bi-tuning of Pre-trained Representations

Bi-tuning: Efficient Transfer from Pre-trained Models

Co-Tuning for Transfer Learning.

Improved Visual Fine-tuning with Natural Language Supervision

Improved Fine-Tuning by Better Leveraging Pre-Training Data

KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier

DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration

A Closer Look at How Fine-tuning Changes BERT

Parameter-Efficient Tuning Makes a Good Classification Head.

Effective Domain Knowledge Transfer with Soft Fine-tuning

Multifaceted Analysis of Fine-Tuning in Deep Model for Visual Recognition

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced Data

Why pre-training is beneficial for downstream classification tasks?

Layer-wise Learning Rate Optimization for Task-Dependent Fine-Tuning of Pre-trained Models: An Evolutionary Approach

On the Generalization Ability of Unsupervised Pretraining

$\mathcal{Y}$-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning

Residual Tuning: Toward Novel Category Discovery Without Labels

What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation

Boundary Matters: A Bi-Level Active Finetuning Framework

How to Fine-Tune BERT for Text Classification?