Abstract:In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when it is confident in them. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of "self-reflection" consisting of iterative self-prompting and self-evaluation. We use the ReSearch algorithm to generate synthetic data on which we finetune our models. Compared to their original versions, our resulting models generate fewer \emph{hallucinations} overall at no additional inference cost, for both known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to abstain by augmenting the samples generated by the model during the search procedure with an answer expressing abstention.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that large - language models (LLMs) need to be able to dynamically adjust their behavior according to their knowledge level and uncertainty about specific topics when being deployed. This adaptive behavior is called "self - restraint", which is crucial for improving the safety and reliability of the model. However, since this behavior depends on the internal knowledge of the model, it is not easy to teach the model self - restraint. Traditional training methods mainly focus on maximizing the probability of the next word, which cannot teach the model to adjust its answers according to its uncertainty. To achieve self - restraint, the author designed a utility function that encourages the model to generate responses only when it is confident about the answers. In addition, the author introduced an iterative self - reflection algorithm named ReSearch, which optimizes this utility function through self - prompting and self - evaluation. The ReSearch algorithm can generate synthetic data, which are used to fine - tune the model, so that the model can reduce hallucinations on both known and unknown topics without increasing the inference cost. Specifically, the main contributions of the paper include: 1. **Design of the utility function**: A utility function that can balance the quantity and accuracy of generated content is proposed, which encourages the model to choose not to answer when it is uncertain. 2. **ReSearch algorithm**: Through iterative self - prompting and self - evaluation, high - quality synthetic data are generated for fine - tuning the model. 3. **Experimental verification**: Experiments were carried out on biography and historical event generation tasks. The results show that the model fine - tuned with the ReSearch algorithm reduces hallucinations while maintaining high accuracy and can choose not to answer as needed. In conclusion, this paper aims to make large - language models show self - restraint behavior when they are uncertain by designing new training methods and algorithms, thereby improving their reliability and safety in practical applications.

LLMs can learn self-restraint through iterative self-reflection

Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism

Large Language Models have Intrinsic Self-Correction Ability

Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations

SELF-[IN]CORRECT: LLMs Struggle with Discriminating Self-Generated Responses

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

The art of llm refinement: Ask, refine, and trust

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy

Large Language Models Can Self-Improve in Long-context Reasoning

Language Model Self-improvement by Reinforcement Learning Contemplation

Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models

Looking Inward: Language Models Can Learn About Themselves by Introspection

Know Your Limits: A Survey of Abstention in Large Language Models

Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement

Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives

Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval

When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models