Continual Learning of Large Language Models: A Comprehensive Survey

Haizhou Shi,Zihao Xu,Hengyi Wang,Weiyi Qin,Wenyuan Wang,Yibin Wang,Zifeng Wang,Sayna Ebrahimi,Hao Wang

2024-06-30

Abstract:The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant performance degradation in previous knowledge domains -- a phenomenon known as "catastrophic forgetting". While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs. In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL. This survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity: vertical continuity (or vertical continual learning), i.e., continual adaptation from general to specific capabilities, and horizontal continuity (or horizontal continual learning), i.e., continual adaptation across time and domains (Section 3). We then summarize three stages of learning LLMs in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT) (Section 4). Then we provide an overview of evaluation protocols for continual learning with LLMs, along with the current available data sources (Section 5). Finally, we discuss intriguing questions pertaining to continual learning for LLMs (Section 6). The full list of papers examined in this survey is available at <a class="link-external link-https" href="https://github.com/Wang-ML-Lab/llm-continual-learning-survey" rel="external noopener nofollow">this https URL</a>.

Machine Learning,Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to effectively and efficiently adapt static pre - trained large - scale language models (LLMs) in continuously changing data distributions. When these pre - trained LLMs are customized to meet specific requirements, their performance in previous knowledge domains often drops significantly, a phenomenon known as "catastrophic forgetting". Although this problem has been widely studied in the Continual Learning (CL) community, it presents new manifestations in the field of large - scale language models. Therefore, the goal of the paper is to provide a comprehensive overview and detailed discussion of the current research progress on LLMs in the context of CL, with a particular focus on how to enable these models to adapt to new data and new tasks without forgetting previously learned knowledge. The paper explores this challenge by dividing continuous learning into two main directions - vertical continuous learning (continuous adaptation from general capabilities to specific capabilities) and horizontal continuous learning (continuous adaptation across time and domains). It also discusses the learning processes in three stages: continuous pre - training (CPT), domain - adaptive pre - training (DAP), and continuous fine - tuning (CFT), and proposes protocols for evaluating continuous - learning LLMs and currently available data sources. Finally, the paper explores interesting issues related to the continuous learning of LLMs, emphasizing the need to develop practical and accessible evaluation benchmarks and specially designed methods to combat forgetting and achieve knowledge transfer.

Continual Learning of Large Language Models: A Comprehensive Survey

Continual Learning for Large Language Models: A Survey

Recent Advances of Foundation Language Models-based Continual Learning: A Survey

Towards Lifelong Learning of Large Language Models: A Survey

Investigating Continual Pretraining in Large Language Models: Insights and Implications

Towards Continual Knowledge Learning of Language Models

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

Recent Advances of Multimodal Continual Learning: A Comprehensive Survey

Online Continual Knowledge Learning for Language Models

Large Language Model Can Continue Evolving From Mistakes

A Comprehensive Survey of Continual Learning: Theory, Method and Application

A Survey on Evaluation of Large Language Models

Continual Learning with Pre-Trained Models: A Survey

Examining Forgetting in Continual Pre-training of Aligned Large Language Models

A Survey on Evaluation of Large Language ModelsJust Accepted

History, Development, and Principles of Large Language Models-An Introductory Survey

Improving Multimodal Large Language Models Using Continual Learning

Multilingual Large Language Models: A Systematic Survey

Evaluating Large Language Models: A Comprehensive Survey