Continual Learning of Large Language Models: A Comprehensive Survey

Haizhou Shi,Zihao Xu,Hengyi Wang,Weiyi Qin,Wenyuan Wang,Yibin Wang,Zifeng Wang,Sayna Ebrahimi,Hao Wang
2024-06-30
Abstract:The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant performance degradation in previous knowledge domains -- a phenomenon known as "catastrophic forgetting". While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs. In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL. This survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity: vertical continuity (or vertical continual learning), i.e., continual adaptation from general to specific capabilities, and horizontal continuity (or horizontal continual learning), i.e., continual adaptation across time and domains (Section 3). We then summarize three stages of learning LLMs in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT) (Section 4). Then we provide an overview of evaluation protocols for continual learning with LLMs, along with the current available data sources (Section 5). Finally, we discuss intriguing questions pertaining to continual learning for LLMs (Section 6). The full list of papers examined in this survey is available at <a class="link-external link-https" href="https://github.com/Wang-ML-Lab/llm-continual-learning-survey" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to effectively and efficiently adapt static pre - trained large - scale language models (LLMs) in continuously changing data distributions. When these pre - trained LLMs are customized to meet specific requirements, their performance in previous knowledge domains often drops significantly, a phenomenon known as "catastrophic forgetting". Although this problem has been widely studied in the Continual Learning (CL) community, it presents new manifestations in the field of large - scale language models. Therefore, the goal of the paper is to provide a comprehensive overview and detailed discussion of the current research progress on LLMs in the context of CL, with a particular focus on how to enable these models to adapt to new data and new tasks without forgetting previously learned knowledge. The paper explores this challenge by dividing continuous learning into two main directions - vertical continuous learning (continuous adaptation from general capabilities to specific capabilities) and horizontal continuous learning (continuous adaptation across time and domains). It also discusses the learning processes in three stages: continuous pre - training (CPT), domain - adaptive pre - training (DAP), and continuous fine - tuning (CFT), and proposes protocols for evaluating continuous - learning LLMs and currently available data sources. Finally, the paper explores interesting issues related to the continuous learning of LLMs, emphasizing the need to develop practical and accessible evaluation benchmarks and specially designed methods to combat forgetting and achieve knowledge transfer.