Abstract:There have been various types of pretraining architectures including autoregressive models (e.g., GPT), autoencoding models (e.g., BERT), and encoder-decoder models (e.g., T5). On the other hand, NLP tasks are different in nature, with three main categories being classification, unconditional generation, and conditional generation. However, none of the pretraining frameworks performs the best for all tasks, which introduces inconvenience for model development and selection. We propose a novel pretraining framework GLM (General Language Model) to address this challenge. Compared to previous work, our architecture has three major benefits: (1) it performs well on classification, unconditional generation, and conditional generation tasks with one single pretrained model; (2) it outperforms BERT-like models on classification due to improved pretrain-finetune consistency; (3) it naturally handles variable-length blank filling which is crucial for many downstream tasks. Empirically, GLM substantially outperforms BERT on the SuperGLUE natural language understanding benchmark with the same amount of pre-training data. Moreover, GLM with 1.25× parameters of BERTLarge achieves the best performance in NLU, conditional and unconditional generation at the same time, which demonstrates its generalizability to different downstream tasks.1 Equal contribution Department of Computer Science and Technology, Tsinghua Univerisity, Beijing, China Beijing Academy of Artificial Intelligence, Beijing, China Massachusetts Institute of Technology, Cambridge, U.S.A. Recurrent AI, Ltd.. Correspondence to: Zhilin Yang <kimi_yang@rcrai.com>, Jie Tang <jietang@tsinghua.edu.cn>. The codes and pre-trained models are available at https: //github.com/THUDM/GLM All [START] NLP tasks are generation tasks All NLP tasks [END] are generation tasks

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

All NLP Tasks Are Generation Tasks: A General Pretraining Framework.

Generalization algorithm of multimodal pre-training model based on graph-text self-supervised training

General Point Model with Autoencoding and Autoregressive

XLNet: Generalized Autoregressive Pretraining for Language Understanding

GLM-130B: An Open Bilingual Pre-trained Model

GLID: Pre-training a Generalist Encoder-Decoder Vision Model

MPNet: Masked and Permuted Pre-training for Language Understanding

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

AutoML-GPT: Automatic Machine Learning with GPT

G-MAP: General Memory-Augmented Pre-trained Language Model for Domain Tasks

mGPT: Few-Shot Learners Go Multilingual

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

Accelerating Vision-Language Pretraining with Free Language Modeling

Large Language Models as Data Preprocessors

In-context Pretraining: Language Modeling Beyond Document Boundaries

Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE

GLGE: A New General Language Generation Evaluation Benchmark

Generate to Understand for Representation