Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

Shengyu Feng,Xiang Kong,Shuang Ma,Aonan Zhang,Dong Yin,Chong Wang,Ruoming Pang,Yiming Yang
2024-10-10
Abstract:Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process supervision, which is costly to acquire. In this paper, we address these limitations by introducing a novel verification method based on Twisted Sequential Monte Carlo (TSMC). TSMC sequentially refines its sampling effort to focus exploration on promising candidates, resulting in more efficient generation of high-quality solutions. We apply TSMC to LLMs by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations. We empirically demonstrate the advantages of our method across multiple math benchmarks, and also validate our theoretical analysis of both our approach and existing verification methods.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address issues primarily focused on the underperformance of large language models (LLMs) in multi-step reasoning tasks. Specifically, the paper focuses on two main problems: 1. **Low Sampling Efficiency**: Current verification methods only evaluate fully generated solutions and do not optimize the quality of these solutions during the generation process. This leads to a significant amount of sampling work being wasted on obviously incorrect partial solutions, thus requiring a large number of samples to obtain a correct solution. This makes the process inefficient and resource-intensive. 2. **Difficulty in Obtaining Process Supervision**: Training powerful verifiers (such as Process Reward Models, PRMs) requires detailed step-by-step supervision. Existing methods either rely on manual annotation or tree search to obtain annotations for intermediate steps, but these methods are inefficient and lack scalability, limiting their practical application in large-scale tasks. To address these issues, the paper proposes a new method based on Twisted Sequential Monte Carlo (TSMC). TSMC is an improved importance sampling technique that enhances sampling efficiency by defining a series of intermediate target distributions at each resampling step. This method effectively reduces the variance of the estimator by retaining the most promising samples and improves the overall efficiency of the sampling process. The paper validates the effectiveness of the TSMC method through experiments on multiple mathematical benchmarks and demonstrates its advantages over existing verification methods.