Self-Evaluation Guided Beam Search for Reasoning

Yuxi Xie,Kenji Kawaguchi,Yiran Zhao,Xu Zhao,Min-Yen Kan,Junxian He,Qizhe Xie

2023-10-26

Abstract:Breaking down a problem into intermediate steps has demonstrated impressive performance in Large Language Model (LLM) reasoning. However, the growth of the reasoning chain introduces uncertainty and error accumulation, making it challenging to elicit accurate final results. To tackle this challenge of uncertainty in multi-step reasoning, we introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of LLMs. We propose a decoding algorithm integrating the self-evaluation guidance via stochastic beam search. The self-evaluation guidance serves as a better-calibrated automatic criterion, facilitating an efficient search in the reasoning space and resulting in superior prediction quality. Stochastic beam search balances exploitation and exploration of the search space with temperature-controlled randomness. Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34\%$, $9.56\%$, and $5.46\%$ on the GSM8K, AQuA, and StrategyQA benchmarks, respectively. Experiment results with Llama-2 on arithmetic reasoning demonstrate the efficiency of our method in outperforming the baseline methods with comparable computational budgets. Further analysis in multi-step reasoning finds our self-evaluation guidance pinpoints logic failures and leads to higher consistency and robustness. Our code is publicly available at <a class="link-external link-https" href="https://guideddecoding.github.io/" rel="external noopener nofollow">this https URL</a>.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the issues of uncertainty and error accumulation encountered by large language models (LLMs) during multi-step reasoning processes. Specifically: - **Challenges in Multi-Step Reasoning**: As the chain of reasoning grows, errors and inaccuracies that may occur at each step accumulate, leading to increased uncertainty in the final result. - **Proposed Method**: The authors introduce a step-by-step self-evaluation mechanism to guide and calibrate the reasoning process of LLMs and propose a stochastic beam search decoding algorithm combined with self-evaluation guidance. This method improves prediction quality by better calibrating automatic criteria and demonstrates superior performance over existing methods on multiple benchmarks. In short, the goal of this research is to reduce error accumulation in the reasoning chain by improving the self-evaluation mechanism during the multi-step reasoning process, thereby enhancing the accuracy and robustness of the final reasoning results.

Self-Evaluation Guided Beam Search for Reasoning

Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

Self-Consistency Improves Chain of Thought Reasoning in Language Models

PathFinder: Guided Search over Multi-Step Reasoning Paths

Learning to Reason via Self-Iterative Process Feedback for Small Language Models

BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering

Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies

Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning

Re-Reading Improves Reasoning in Large Language Models

Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning

Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

Let's reward step by step: Step-Level reward model as the Navigators for Reasoning

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

Improving Retrieval Augmented Language Model with Self-Reasoning

Automatic Curriculum Expert Iteration for Reliable LLM Reasoning

ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness

Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems

SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding