Learn it or Leave it: Module Composition and Pruning for Continual Learning

Mingyang Wang,Heike Adel,Lukas Lange,Jannik Strötgen,Hinrich Schütze
2024-06-27
Abstract:In real-world environments, continual learning is essential for machine learning models, as they need to acquire new knowledge incrementally without forgetting what they have already learned. While pretrained language models have shown impressive capabilities on various static tasks, applying them to continual learning poses significant challenges, including avoiding catastrophic forgetting, facilitating knowledge transfer, and maintaining parameter efficiency. In this paper, we introduce MoCL-P, a novel lightweight continual learning method that addresses these challenges simultaneously. Unlike traditional approaches that continuously expand parameters for newly arriving tasks, MoCL-P integrates task representation-guided module composition with adaptive pruning, effectively balancing knowledge integration and computational overhead. Our evaluation across three continual learning benchmarks with up to 176 tasks shows that MoCL-P achieves state-of-the-art performance and improves parameter efficiency by up to three times, demonstrating its potential for practical applications where resource requirements are constrained.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the three main challenges in Continual Learning (CL): 1. **Avoiding catastrophic forgetting**: Newly - learned information should not damage or degrade the previously acquired knowledge. In continual learning, the model needs to continuously acquire new knowledge without forgetting the knowledge it has already learned. 2. **Promoting knowledge transfer**: The knowledge in past tasks should be reused to efficiently learn new tasks. This requires the model to be able to share existing knowledge among different tasks, thereby improving the learning efficiency. 3. **Maintaining parameter efficiency**: The language model needs to remain lightweight and efficient when handling a large number of tasks. As the number of tasks increases, the model's parameters cannot be expanded without limit, otherwise the computational and storage costs will increase significantly. To solve these problems, the authors propose MOCL - P (Module Composition and Pruning for Continual Learning), which is a novel lightweight continual learning method. MOCL - P addresses these challenges in the following ways: - **Avoiding catastrophic forgetting**: MOCL - P adds task - specific modules to the pre - trained language model (PLM) to learn new tasks and freezes these modules after completing the training of the corresponding tasks, thus preventing catastrophic forgetting. - **Promoting knowledge transfer**: MOCL - P allows the reuse of existing knowledge through module combination, thereby achieving knowledge transfer across tasks. - **Maintaining parameter efficiency**: MOCL - P adopts an adaptive pruning strategy, removing modules with redundant information and only retaining the most important modules, ensuring that the model remains lightweight throughout the continual learning process. Through these methods, MOCL - P has demonstrated state - of - the - art performance in multiple benchmark tests and is superior to other methods in terms of parameter efficiency, with a parameter efficiency improvement of up to three times. This makes MOCL - P an effective continual learning solution suitable for resource - constrained environments. ### Formula Representation There are no particularly complex formulas in the paper, but for the sake of clarity, the following is the Markdown - format representation of the relevant formulas: - Training objective function: \[ \min_{P_m, v_m}-\sum_{x_n, y_n}\log p(y_n|x_n, P'_n, \theta)-\sum_{x_n}\cos(x_n, v_m) \] where \(P'_n = \sum_{k = 1}^m\alpha_kP_k\) is the weighted sum of the new trainable task modules and the existing frozen task modules. - Module matching weight calculation: \[ \alpha_i=\cos(x_n, v_i) \] These formulas ensure that the model can effectively balance the learning of new knowledge and the retention of existing knowledge during the continual learning process.