SELF: Self-Evolution with Language Feedback

Jianqiao Lu,Wanjun Zhong,Wenyong Huang,Yufei Wang,Qi Zhu,Fei Mi,Baojun Wang,Weichao Wang,Xingshan Zeng,Lifeng Shang,Xin Jiang,Qun Liu
2024-02-01
Abstract:Large Language Models (LLMs) have demonstrated remarkable versatility across various domains. To further advance LLMs, we propose 'SELF' (Self-Evolution with Language Feedback), a novel approach that enables LLMs to self-improve through self-reflection, akin to human learning processes. SELF initiates with a meta-skill learning process that equips the LLMs with capabilities for self-feedback and self-refinement. Subsequently, the model undergoes an iterative process of self-evolution. In each iteration, it utilizes an unlabeled dataset of instructions to generate initial responses. These responses are enhanced through self-feedback and self-refinement. The model is then fine-tuned using this enhanced data. The model undergoes progressive improvement through this iterative self-evolution process. Moreover, the SELF framework enables the model to apply self-refinement during inference, which further improves response quality. Our experiments in mathematics and general tasks demonstrate that SELF can enhance the capabilities of LLMs without human intervention. The SELF framework indicates a promising direction for the autonomous evolution of LLMs, transitioning them from passive information receivers to active participants in their development.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how large - language models (LLMs) can achieve autonomous evolution. Although existing LLMs perform well in various tasks, they mainly rely on manually - annotated datasets and reinforcement - learning methods to improve their performance, and these methods require a large amount of resources and continuous human intervention. Therefore, the paper proposes a new framework named SELF (Self - Evolution with Language Feedback), aiming to enable LLMs to autonomously improve their performance and reduce their dependence on external intervention through the ability of self - feedback and self - refinement. Specifically, the SELF framework contains two main stages: 1. **Meta - skill learning stage**: At this stage, the model learns the key meta - skills of self - feedback and self - refinement through a limited number of supervised examples. This lays the foundation for subsequent self - evolution. 2. **Self - evolution stage**: At this stage, the model utilizes the acquired meta - skills to gradually improve its own ability through a multi - round iterative self - evolution training process. In each round of iteration, the model will autonomously generate high - quality training data and use this data for self - training, thereby continuously optimizing its performance. In addition, the SELF framework also emphasizes the importance of natural - language feedback in guiding the model evolution process. Compared with traditional scalar rewards, natural - language feedback provides a more detailed and comprehensive evaluation, which helps the model to discover errors and propose improvement directions in complex reasoning tasks. The paper verifies the effectiveness of SELF through experiments in mathematics and general fields. The results show that SELF can not only significantly improve the performance of the model on mathematical tasks, but also achieve better results on general tasks, proving its potential in promoting the autonomous evolution of LLMs.