Abstract:Few-shot classification requires adapting knowledge learned from a large annotated base dataset to recognize novel unseen classes, each represented by few labeled examples. In such a scenario, pretraining a network with high capacity on the large dataset and then finetuning it on the few examples causes severe overfitting. At the same time, training a simple linear classifier on top of ``frozen'' features learned from the large labeled dataset fails to adapt the model to the properties of the novel classes, effectively inducing underfitting. In this paper we propose an alternative approach to both of these two popular strategies. First, our method pseudo-labels the entire large dataset using the linear classifier trained on the novel classes. This effectively ``hallucinates'' the novel classes in the large dataset, despite the novel categories not being present in the base database (novel and base classes are disjoint). Then, it finetunes the entire model with a distillation loss on the pseudo-labeled base examples, in addition to the standard cross-entropy loss on the novel dataset. This step effectively trains the network to recognize contextual and appearance cues that are useful for the novel-category recognition but using the entire large-scale base dataset and thus overcoming the inherent data-scarcity problem of few-shot learning. Despite the simplicity of the approach, we show that that our method outperforms the state-of-the-art on four well-established few-shot classification benchmarks. The code is available at https://github.com/yiren-jian/LabelHalluc.

Effectiveness of Pre-training for Few-shot Intent Classification

Improving Few-shot Text Classification via Pretrained Language Representations

Fine-tuning Pre-trained Language Models for Few-shot Intent Detection: Supervised Pre-training and Isotropization

Few-shot Learning for Multi-label Intent Detection

Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training

When Low Resource NLP Meets Unsupervised Language Model: Meta-Pretraining then Meta-Learning for Few-Shot Text Classification (Student Abstract)

Feature Transformation for Few-Shot Learning

Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification

New Intent Discovery with Pre-training and Contrastive Learning

Efficient Few-Shot Classification Via Contrastive Pretraining on Web Data.

Are Fewer Labels Possible for Few-shot Learning?

Label Hallucination for Few-Shot Classification

Mask-guided BERT for Few Shot Text Classification

Less is More: A Closer Look at Semantic-based Few-Shot Learning

Formulating Few-shot Fine-tuning Towards Language Model Pre-training: A Pilot Study on Named Entity Recognition

Learning to Classify Intents and Slot Labels Given a Handful of Examples

CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection

Are Pretrained Transformers Robust in Intent Classification? A Missing Ingredient in Evaluation of Out-of-Scope Intent Detection

Triple Channel Feature Fusion Few-Shot Intent Recognition With Orthogonality Constrained Multi-Head Attention

BERT for Joint Intent Classification and Slot Filling

FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?