Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

Shengyu Feng,Xiang Kong,Shuang Ma,Aonan Zhang,Dong Yin,Chong Wang,Ruoming Pang,Yiming Yang

2024-10-10

Abstract:Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process supervision, which is costly to acquire. In this paper, we address these limitations by introducing a novel verification method based on Twisted Sequential Monte Carlo (TSMC). TSMC sequentially refines its sampling effort to focus exploration on promising candidates, resulting in more efficient generation of high-quality solutions. We apply TSMC to LLMs by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations. We empirically demonstrate the advantages of our method across multiple math benchmarks, and also validate our theoretical analysis of both our approach and existing verification methods.

Machine Learning

What problem does this paper attempt to address?

The paper attempts to address issues primarily focused on the underperformance of large language models (LLMs) in multi-step reasoning tasks. Specifically, the paper focuses on two main problems: 1. **Low Sampling Efficiency**: Current verification methods only evaluate fully generated solutions and do not optimize the quality of these solutions during the generation process. This leads to a significant amount of sampling work being wasted on obviously incorrect partial solutions, thus requiring a large number of samples to obtain a correct solution. This makes the process inefficient and resource-intensive. 2. **Difficulty in Obtaining Process Supervision**: Training powerful verifiers (such as Process Reward Models, PRMs) requires detailed step-by-step supervision. Existing methods either rely on manual annotation or tree search to obtain annotations for intermediate steps, but these methods are inefficient and lack scalability, limiting their practical application in large-scale tasks. To address these issues, the paper proposes a new method based on Twisted Sequential Monte Carlo (TSMC). TSMC is an improved importance sampling technique that enhances sampling efficiency by defining a series of intermediate target distributions at each resampling step. This method effectively reduces the variance of the estimator by retaining the most promising samples and improves the overall efficiency of the sampling process. The paper validates the effectiveness of the TSMC method through experiments on multiple mathematical benchmarks and demonstrates its advantages over existing verification methods.

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification

Enhancing Mathematical Reasoning in LLMs by Stepwise Correction

GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach

Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models

Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning

No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function

General Purpose Verification for Chain of Thought Prompting

VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Kwai-STaR: Transform LLMs into State-Transition Reasoners

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems

Advancing Process Verification for Large Language Models via Tree-Based Preference Learning

VerityMath: Advancing Mathematical Reasoning by Self-Verification Through Unit Consistency

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

Improve Mathematical Reasoning in Language Models by Automated Process Supervision