Abstract:Abstract Most curriculum learning methods require an approach to sort the data samples by difficulty, which is often cumbersome to perform. In this work, we propose a novel curriculum learning approach termed Learning Rate Curriculum (LeRaC), which leverages the use of a different learning rate for each layer of a neural network to create a data-agnostic curriculum during the initial training epochs. More specifically, LeRaC assigns higher learning rates to neural layers closer to the input, gradually decreasing the learning rates as the layers are placed farther away from the input. The learning rates increase at various paces during the first training iterations, until they all reach the same value. From this point on, the neural model is trained as usual. This creates a model-level curriculum learning strategy that does not require sorting the examples by difficulty and is compatible with any neural network, generating higher performance levels regardless of the architecture. We conduct comprehensive experiments on 12 data sets from the computer vision (CIFAR-10, CIFAR-100, Tiny ImageNet, ImageNet-1K, Food-101, UTKFace, PASCAL VOC), language (BoolQ, QNLI, RTE) and audio (ESC-50, CREMA-D) domains, considering various convolutional (ResNet-18, Wide-ResNet-50, DenseNet-121, YOLOv5), recurrent (LSTM) and transformer (CvT, BERT, SepTr) architectures. We compare our approach with the conventional training regime, as well as with Curriculum by Smoothing (CBS), a state-of-the-art data-agnostic curriculum learning approach. Unlike CBS, our performance improvements over the standard training regime are consistent across all data sets and models. Furthermore, we significantly surpass CBS in terms of training time (there is no additional cost over the standard training regime for LeRaC). Our code is freely available at: https://github.com/CroitoruAlin/LeRaC .

Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks

Theory of Curriculum Learning, with Convex Loss Functions

An Analytical Theory of Curriculum Learning in Teacher-Student Networks

An Empirical Exploration of Curriculum Learning for Neural Machine Translation

Spatial Transformer Networks for Curriculum Learning

The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression

Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective

Mapping the Learning Curves of Deep Learning Networks

Features are fate: a theory of transfer learning in high-dimensional regression

EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones.

Learning Rate Curriculum

Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias

Frozen Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks

On the Learning Dynamics of Two-layer Nonlinear Convolutional Neural Networks.

Curriculum generation using Autoencoder based continuous optimization

Curriculum Loss: Robust Learning and Generalization against Label Corruption

When Do Curricula Work?

Universality in Transfer Learning for Linear Models

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

Statistical Measures For Defining Curriculum Scoring Function