Abstract:With the notable success of pretrained language models, the pretraining-fine-tuning paradigm has become a dominant solution for natural language understanding (NLU) tasks. Typically, the training instances of a target NLU task are introduced in a completely random order and treated equally at the fine-tuning stage. However, these instances can vary greatly in difficulty, and similar to human learning procedures, language models can benefit from an easy-to-difficult curriculum. Based on this concept, we propose a curriculum learning (CL) framework. Our framework consists of two stages, Review and Arrange, targeting the two main challenges in curriculum learning, i.e., how to define the difficulty of instances and how to arrange a curriculum based on the difficulty, respectively. In the first stage, we devise a cross-review (CR) method to train several teacher models first and then review the training set in a crossed manner to distinguish easy instances from difficult instances. In the second stage, two sampling algorithms, a coarse-grained arrangement (CGA) and a fine-grained arrangement (FGA), are proposed to arrange a curriculum for language models in which the learning materials start from the easiest instances, and more difficult instances are gradually added into the training procedure. Compared to previous heuristic CL methods, our framework can avoid the errors caused by a gap in difficulty between humans and machines and has strong generalization ability. We conduct comprehensive experiments, and the results show that our curriculum learning framework, without any manual model architecture design or use of external data, obtains significant and universal performance improvements on a wide range of NLU tasks in different languages.

Language Model Pre-training with Linguistically Motivated Curriculum Learning

Curriculum learning for language modeling

Irreducible Curriculum for Language Model Pretraining

Gradual Syntactic Label Replacement for Language Model Pre-Training

Review and Arrange: Curriculum Learning for Natural Language Understanding

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

Pre-Trained Language Models and Their Applications

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Recent Advances in Pre-trained Language Models: Why Do They Work and How Do They Work

Visualizing and Understanding Curriculum Learning for Long Short-Term Memory Networks

Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior?

Instruction Pre-Training: Language Models are Supervised Multitask Learners

In-context Pretraining: Language Modeling Beyond Document Boundaries

Ling-CL: Understanding NLP Models through Linguistic Curricula

LERT: A Linguistically-motivated Pre-trained Language Model

Pre-Training a Language Model Without Human Language

Multimodal Pretraining from Monolingual to Multilingual

Curriculum Pre-training for End-to-End Speech Translation

Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding

Reinforced Curriculum Learning on Pre-trained Neural Machine Translation Models