Abstract:Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.

Pre-trained Language Models can be Fully Zero-Shot Learners

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

What Makes Pre-trained Language Models Better Zero/Few-shot Learners?

Beyond Prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations

AdaPrompt: Adaptive Model Training for Prompt-based NLP

Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning

Prompting Language Models for Linguistic Structure

Exploring Lottery Prompts for Pre-trained Language Models

Unified Prompt Learning Makes Pre-Trained Language Models Better Few-Shot Learners

Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt

Bidirectional Language Models Are Also Few-shot Learners

Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance

Large Language Models Are Zero-Shot Text Classifiers

Zero-Label Prompt Selection

A Practical Survey on Zero-shot Prompt Design for In-context Learning

PPT: Pre-trained Prompt Tuning for Few-shot Learning

Knowledge Prompting in Pre-trained Language Model for Natural Language Understanding

Respectful or Toxic? Using Zero-Shot Learning with Language Models to Detect Hate Speech

HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners