Abstract:Most recent progress in natural language understanding (NLU) has been driven, in part, by benchmarks such as GLUE, SuperGLUE, SQuAD, etc. In fact, many NLU models have now matched or exceeded "human-level" performance on many tasks in these benchmarks. Most of these benchmarks, however, give models access to relatively large amounts of labeled data for training. As such, the models are provided far more data than required by humans to achieve strong performance. That has motivated a line of work that focuses on improving few-shot learning performance of NLU models. However, there is a lack of standardized evaluation benchmarks for few-shot NLU resulting in different experimental settings in different papers. To help accelerate this line of work, we introduce CLUES (Constrained Language Understanding Evaluation Standard), a benchmark for evaluating the few-shot learning capabilities of NLU models. We demonstrate that while recent models reach human performance when they have access to large amounts of labeled data, there is a huge gap in performance in the few-shot setting for most tasks. We also demonstrate differences between alternative model families and adaptation techniques in the few shot setting. Finally, we discuss several principles and choices in designing the experimental settings for evaluating the true few-shot learning performance and suggest a unified standardized approach to few-shot learning evaluation. We aim to encourage research on NLU models that can generalize to new tasks with a small number of examples. Code and data for CLUES are available at <a class="link-external link-https" href="https://github.com/microsoft/CLUES" rel="external noopener nofollow">this https URL</a>.

When Few-Shot Learning Meets Large-Scale Knowledge-Enhanced Pre-training: Alibaba at FewCLUE

FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning

Improving Few-shot Text Classification via Pretrained Language Representations

When Low Resource NLP Meets Unsupervised Language Model: Meta-Pretraining then Meta-Learning for Few-Shot Text Classification (Student Abstract)

Collaboration of Pre-trained Models Makes Better Few-shot Learner

Less is More: A Closer Look at Semantic-based Few-Shot Learning

FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding

Entailment Method Based on Template Selection for Chinese Text Few-shot Learning

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

CLG: Contrastive Label Generation with Knowledge for Few-Shot Learning

MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning

Unified View Empirical Study for Large Pretrained Model on Cross-Domain Few-Shot Learning

CLUE: A Chinese Language Understanding Evaluation Benchmark

Revisiting and Advancing Chinese Natural Language Understanding with Accelerated Heterogeneous Knowledge Pre-training

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Few-shot Food Recognition with Pre-trained Model.

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing