Abstract:There have been various types of pretraining architectures including autoregressive models (e.g., GPT), autoencoding models (e.g., BERT), and encoder-decoder models (e.g., T5). On the other hand, NLP tasks are different in nature, with three main categories being classification, unconditional generation, and conditional generation. However, none of the pretraining frameworks performs the best for all tasks, which introduces inconvenience for model development and selection. We propose a novel pretraining framework GLM (General Language Model) to address this challenge. Compared to previous work, our architecture has three major benefits: (1) it performs well on classification, unconditional generation, and conditional generation tasks with one single pretrained model; (2) it outperforms BERT-like models on classification due to improved pretrain-finetune consistency; (3) it naturally handles variable-length blank filling which is crucial for many downstream tasks. Empirically, GLM substantially outperforms BERT on the SuperGLUE natural language understanding benchmark with the same amount of pre-training data. Moreover, GLM with 1.25× parameters of BERTLarge achieves the best performance in NLU, conditional and unconditional generation at the same time, which demonstrates its generalizability to different downstream tasks.1 Equal contribution Department of Computer Science and Technology, Tsinghua Univerisity, Beijing, China Beijing Academy of Artificial Intelligence, Beijing, China Massachusetts Institute of Technology, Cambridge, U.S.A. Recurrent AI, Ltd.. Correspondence to: Zhilin Yang <kimi_yang@rcrai.com>, Jie Tang <jietang@tsinghua.edu.cn>. The codes and pre-trained models are available at https: //github.com/THUDM/GLM All [START] NLP tasks are generation tasks All NLP tasks [END] are generation tasks

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains.

All NLP Tasks Are Generation Tasks: A General Pretraining Framework.

GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language Models at Almost No Cost

HyperLoRA: Efficient Cross-task Generalization Via Constrained Low-Rank Adapters Generation

Accelerating Vision-Language Pretraining with Free Language Modeling

Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review

FreeLM: Fine-Tuning-Free Language Model

N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models.

MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

A Compact Pretraining Approach for Neural Language Models

Natural Language Processing (Almost) from Scratch

Leveraging Neighbor Attention Initialization (NAI) for Efficient Training of Pretrained LLMs

LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models

NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training

Pre-Trained Language Models and Their Applications

LERT: A Linguistically-motivated Pre-trained Language Model

A concise analysis of low-rank adaptation

N-LTP: an Open-source Neural Language Technology Platform for Chinese

Sparsity-Accelerated Training for Large Language Models

Preparing Lessons for Progressive Training on Language Models