Abstract:As world knowledge evolves and new task schemas emerge, Continual Learning (CL) is crucial for keeping Large Language Models (LLMs) up-to-date and addressing their shortcomings. LLMs typically require continual instruction tuning (CIT) and continual pre-training (CPT) to adapt to new tasks and acquire essential knowledge. However, collecting sufficient CPT data while addressing knowledge gaps remains challenging, as does optimizing the efficiency of utilizing this data. Inspired by the 'summarizing mistakes' strategy, we propose the Continue Evolving from Mistakes (CEM) method, a data-efficient approach aiming to collect CPT data and continually improve LLMs' performance through iterative evaluation and supplementation with mistake-relevant knowledge. To enhance data utilization and mitigate forgetting, we introduce a novel training paradigm that combines CIT and CPT data. Experiments demonstrate that CEM significantly enhances model performance and continual evolution. The code and dataset are available in the GitHub.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How can large - language models (LLMs) keep evolving continuously and stay up - to - date to meet the emerging new knowledge and task requirements? Specifically, the paper focuses on the following issues: 1. **Knowledge Update and Adaptation to New Tasks**: As the world's knowledge evolves and new task patterns emerge, LLMs need to keep learning to stay up - to - date and address their existing shortcomings. 2. **Data Collection and Utilization Efficiency**: When conducting continuous pre - training (CPT), it is challenging to collect sufficient CPT data to fill knowledge gaps, and it is also crucial to optimize the utilization efficiency of these data. 3. **Forgetting Problem**: During the continuous learning process, the model is prone to forget the knowledge it has learned before (i.e., catastrophic forgetting), which will affect the overall performance of the model. To solve the above problems, the authors propose the "Continue Evolving from Mistakes (CEM)" method. The main features of the CEM method include: - **Error - based Data Collection**: By identifying the errors of the model when answering questions, the supplementary corpus (CPT data) is collected in a targeted manner, directly addressing the knowledge deficiencies of the model. - **New Training Set Construction Paradigm**: Combining continuous instruction tuning (CIT) and continuous pre - training (CPT) data, a new way of constructing training sets is designed to improve data utilization and reduce forgetting. The experimental results show that the CEM method significantly improves the model performance, achieving an accuracy improvement of up to 17.00%, and shows the ability to continuously improve in multiple iterations. ### Summary of Key Formulas The formulas involved in the paper are mainly used to evaluate the model performance and improvement effects, such as: - **Enhancement Rate (ER)**: \[ ER=\frac{R^*_k - R^0_k}{R^0_k} \] where \(k\) represents the task using CEM, \(R^0_k\) is the initial accuracy, and \(R^*_k\) is the accuracy after applying CEM. - **Average Forgetting Rate (AFR)**: \[ AFR = \frac{1}{N - 1}\sum_{i = 1, i\neq k}^N\left(\frac{R^0_i - R^*_i}{R^0_i}\right) \] where \(R^0_i\) is the initial accuracy of task \(i\) after the initial fine - tuning, and \(R^*_i\) is the accuracy of task \(i\) after applying CEM to task \(k\). Through these methods and evaluation metrics, the CEM method effectively solves the key problems of LLMs in continuous learning.

Large Language Model Can Continue Evolving From Mistakes

Continual Learning for Large Language Models: A Survey

Towards Continual Knowledge Learning of Language Models

Continual Learning of Large Language Models: A Comprehensive Survey

Towards Lifelong Learning of Large Language Models: A Survey

Learning From Mistakes Makes LLM Better Reasoner

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models

Online Continual Knowledge Learning for Language Models

From Static to Dynamic: A Continual Learning Framework for Large Language Models

Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

Interactive Continual Learning: Fast and Slow Thinking

Continual Learning Using Only Large Language Model Prompting

When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering

TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models

Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning

Unlocking Continual Learning Abilities in Language Models

Towards Practical Tool Usage for Continually Learning LLMs

Supervised Knowledge Makes Large Language Models Better In-context Learners

MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement