LLMs can learn self-restraint through iterative self-reflection

Alexandre Piché,Aristides Milios,Dzmitry Bahdanau,Chris Pal
2024-07-03
Abstract:In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when it is confident in them. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of "self-reflection" consisting of iterative self-prompting and self-evaluation. We use the ReSearch algorithm to generate synthetic data on which we finetune our models. Compared to their original versions, our resulting models generate fewer \emph{hallucinations} overall at no additional inference cost, for both known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to abstain by augmenting the samples generated by the model during the search procedure with an answer expressing abstention.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that large - language models (LLMs) need to be able to dynamically adjust their behavior according to their knowledge level and uncertainty about specific topics when being deployed. This adaptive behavior is called "self - restraint", which is crucial for improving the safety and reliability of the model. However, since this behavior depends on the internal knowledge of the model, it is not easy to teach the model self - restraint. Traditional training methods mainly focus on maximizing the probability of the next word, which cannot teach the model to adjust its answers according to its uncertainty. To achieve self - restraint, the author designed a utility function that encourages the model to generate responses only when it is confident about the answers. In addition, the author introduced an iterative self - reflection algorithm named ReSearch, which optimizes this utility function through self - prompting and self - evaluation. The ReSearch algorithm can generate synthetic data, which are used to fine - tune the model, so that the model can reduce hallucinations on both known and unknown topics without increasing the inference cost. Specifically, the main contributions of the paper include: 1. **Design of the utility function**: A utility function that can balance the quantity and accuracy of generated content is proposed, which encourages the model to choose not to answer when it is uncertain. 2. **ReSearch algorithm**: Through iterative self - prompting and self - evaluation, high - quality synthetic data are generated for fine - tuning the model. 3. **Experimental verification**: Experiments were carried out on biography and historical event generation tasks. The results show that the model fine - tuned with the ReSearch algorithm reduces hallucinations while maintaining high accuracy and can choose not to answer as needed. In conclusion, this paper aims to make large - language models show self - restraint behavior when they are uncertain by designing new training methods and algorithms, thereby improving their reliability and safety in practical applications.