Abstract:Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically examines the role and efficacy of self-correction within LLMs, shedding light on its true potential and limitations. Central to our investigation is the notion of intrinsic self-correction, whereby an LLM attempts to correct its initial responses based solely on its inherent capabilities, without the crutch of external feedback. In the context of reasoning, our research indicates that LLMs struggle to self-correct their responses without external feedback, and at times, their performance even degrades after self-correction. Drawing from these insights, we offer suggestions for future research and practical applications in this field.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper primarily explores the self-correction ability of large language models (LLMs) in reasoning tasks. Specifically, the researchers focus on the following issues: 1. **Effectiveness of Self-Correction**: Although large language models have demonstrated excellent text generation capabilities in various applications, the accuracy and appropriateness of their generated content remain controversial. Self-correction, as a potential solution, aims to improve the initial response through the model's own feedback. However, whether this self-correction is truly effective, especially in the absence of external feedback, remains an unresolved mystery. 2. **Limitations of Intrinsic Self-Correction**: The researchers define "intrinsic self-correction" as the model's ability to correct its initial response relying solely on its own capabilities, without external feedback. The core question of the paper is: If the model can self-correct, why can't it provide the correct answer on the first attempt? The research findings indicate that LLMs find it challenging to perform effective self-correction without external feedback, and sometimes their performance may even decline. 3. **Rationality of Evaluation Methods**: Some existing studies claim that self-correction can significantly improve model performance, but these studies often rely on "oracle labels" to guide the self-correction process. When these labels are unavailable, the performance improvement of the model is not significant. Additionally, the researchers point out the shortcomings of methods like multi-agent debate compared to self-consistency. 4. **Importance of Prompt Design**: The effectiveness of self-correction depends not only on the model itself but also on the design of the prompts. If the initial prompt is not detailed enough, it is difficult to determine whether the performance improvement is due to more detailed feedback prompts or the self-correction step itself, even if there is some performance enhancement. ### Main Findings - **Ineffectiveness of Intrinsic Self-Correction**: The research results show that the self-correction ability of LLMs is very limited without external feedback and may even lead to performance decline. - **Dependence on Oracle Labels**: Many existing studies perform well when using oracle labels, but these labels are often unavailable in practical applications, limiting the applicability of these studies' results in the real world. - **Limitations of Multi-Agent Debate**: The multi-agent debate method does not show significant advantages compared to the self-consistency method and performs worse in some cases. - **Importance of Prompt Design**: The design of the initial prompt greatly affects the model's performance, and improper prompt design may lead to poor self-correction results. ### Conclusion The paper emphasizes that current large language models find it difficult to perform effective self-correction without external feedback. This suggests that expecting these models to autonomously identify and correct their reasoning errors is unrealistic. The researchers recommend that future studies should focus more on how to utilize external feedback to enhance model performance and ensure fair comparisons with baseline methods that have comparable reasoning costs when evaluating self-correction techniques.

Large Language Models Cannot Self-Correct Reasoning Yet

Large Language Models have Intrinsic Self-Correction Ability

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

Can Large Language Models Reason and Plan?

Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models

Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

Smaller Large Language Models Can Do Moral Self-Correction

Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

Small Language Model Can Self-correct

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

Large Language Models Can Self-Correct with Key Condition Verification

Large Language Models are reasoners with Self-Verification

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks

Large Language Models Can Self-Improve in Long-context Reasoning

A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning

Evaluating Consistency and Reasoning Capabilities of Large Language Models

Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

Can Large Language Models Really Improve by Self-critiquing Their Own Plans?