Cup Curriculum: Curriculum Learning on Model Capacity

Luca Scharr,Vanessa Toborek
2023-11-07
Abstract:Curriculum learning (CL) aims to increase the performance of a learner on a given task by applying a specialized learning strategy. This strategy focuses on either the dataset, the task, or the model. There is little to no work analysing the possibilities to apply CL on the model capacity in natural language processing. To close this gap, we propose the cup curriculum. In a first phase of training we use a variation of iterative magnitude pruning to reduce model capacity. These weights are reintroduced in a second phase, resulting in the model capacity to show a cup-shaped curve over the training iterations. We empirically evaluate different strategies of the cup curriculum and show that it outperforms early stopping reliably while exhibiting a high resilience to overfitting.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the possibility and effectiveness of applying Curriculum Learning (CL) to model capacity in Natural Language Processing (NLP). Specifically, the authors propose a new method called "cup curriculum," which involves initially reducing the model capacity during training and then gradually restoring it, forming a cup-shaped curve to improve model performance. This approach aims to overcome the limitations of existing techniques such as early stopping and Iterative Magnitude Pruning (IMP), particularly in preventing overfitting and enhancing model generalization. The main contributions of the paper include: 1. **Proposing Model Curriculum**: For the first time, the analysis of curriculum learning for model capacity in the NLP field is conducted, and the new concept of "cup curriculum" is introduced. 2. **Testing Different Strategies**: Extensive experiments are conducted to validate different "cup curriculum" strategies, including various reset, initialization, and update schemes. 3. **Providing Usable Code**: To facilitate future research, the authors also provide easy-to-use code. Through this work, the authors demonstrate that the "cup curriculum" can reliably outperform early stopping across various model sizes and has a high resistance to overfitting. This provides new ideas and methods for model optimization in the NLP field.