Abstract:The recent success of Large Language Models (LLMs) has catalyzed an increasing interest in their self-correction capabilities. This paper presents a comprehensive investigation into the intrinsic self-correction of LLMs, attempting to address the ongoing debate about its feasibility. Our research has identified an important latent factor - the ``confidence'' of LLMs - during the self-correction process. Overlooking this factor may cause the models to over-criticize themselves, resulting in unreliable conclusions regarding the efficacy of self-correction. We have experimentally observed that LLMs possess the capability to understand the ``confidence'' in their own responses. It motivates us to develop an ``If-or-Else'' (IoE) prompting framework, designed to guide LLMs in assessing their own ``confidence'', facilitating intrinsic self-corrections. We conduct extensive experiments and demonstrate that our IoE-based Prompt can achieve a consistent improvement regarding the accuracy of self-corrected responses over the initial answers. Our study not only sheds light on the underlying factors affecting self-correction in LLMs, but also introduces a practical framework that utilizes the IoE prompting principle to efficiently improve self-correction capabilities with ``confidence''. The code is available at \url{https://github.com/MBZUAI-CLeaR/IoE-Prompting.git}.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to explore the inherent self - correction ability of large - language models (LLMs) and re - evaluate its feasibility. Specifically, the paper focuses on a crucial factor in LLMs' self - correction - **confidence**. The author believes that ignoring this factor may lead to excessive self - criticism of the model, resulting in unreliable conclusions. Therefore, the main objectives of the paper are: 1. **Identify the importance of confidence**: Research how LLMs evaluate the confidence level of their own answers and explore the impact of such confidence evaluation on the inherent self - correction process. 2. **Propose a new prompt framework**: Based on the understanding of confidence, develop an "If - or - Else" (IoE) prompt framework to guide LLMs to evaluate their own confidence and perform self - correction accordingly. 3. **Verify the effectiveness of the method**: Through a series of experiments, verify the effectiveness of the IoE prompt framework in improving the self - correction accuracy of LLMs and compare it with the existing critical prompt method. ### Main contributions - **Confidence evaluation**: Research shows that LLMs can effectively evaluate the confidence level of their own answers, especially in deterministic tasks and open - ended tasks. - **IoE prompt framework**: Propose a new prompt framework that guides LLMs to perform self - correction according to the confidence level through the "If - or - Else" principle. - **Experimental verification**: Conducted extensive experiments on multiple benchmarks, and the results show that the IoE prompt framework has significant advantages in improving self - correction accuracy. ### Experimental results - **Effectiveness of confidence evaluation**: In deterministic tasks, the confidence evaluation of LLMs is highly consistent with the consistency check results of multiple inferences; in open - ended tasks, LLMs can also effectively evaluate confidence. - **Effect of self - correction**: The IoE prompt framework shows better self - correction ability in tasks with different confidence levels, especially in low - confidence tasks, avoiding incorrect corrections caused by excessive criticism. - **Performance in multi - modal tasks**: In multi - modal tasks, the IoE prompt framework also shows superiority over standard prompts and critical prompt methods. ### Conclusion By introducing confidence evaluation and the IoE prompt framework, the paper significantly improves the self - correction ability of LLMs. This not only provides a new perspective for understanding the internal mechanisms of LLMs but also provides a practical framework for self - correction in practical applications.

Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models

Large Language Models have Intrinsic Self-Correction Ability

Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction

Large Language Models Cannot Self-Correct Reasoning Yet

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

Small Language Model Can Self-correct

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

Understanding the Dark Side of LLMs' Intrinsic Self-Correction

A Theoretical Understanding of Self-Correction through In-context Alignment

Smaller Large Language Models Can Do Moral Self-Correction

Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

Large Language Models Can Self-Correct with Key Condition Verification

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis

Self-Cognition in Large Language Models: An Exploratory Study

On the Intersection of Self-Correction and Trust in Language Models

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

Small Language Models Need Strong Verifiers to Self-Correct Reasoning