Abstract:Chain-of-Thought (CoT) prompting methods have enabled large language models (LLMs) to generate reasoning paths and solve math word problems (MWPs). However, they are sensitive to mistakes in the paths, as any mistake can result in an incorrect answer. We propose a novel method named Progressive Rectification Prompting (PRP) to improve average accuracy on eight MWP datasets from 77.3 to 90.5. Given an initial answer from CoT, PRP iterates a verify-then-rectify process to progressively identify incorrect answers and rectify the reasoning paths. With the most likely correct answer, the LLM predicts a masked numerical value in the question; if the prediction does not match the masked value, the answer is likely incorrect. Then the LLM is prompted to re-generate the reasoning path hinted with a set of incorrect answers to prevent itself from repeating previous mistakes. PRP achieves the best performance compared against the CoT methods. Our implementation is made publicly available at https://wzy6642.github.io/prp.github.io/.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are the three main drawbacks of the existing Chain - of - Thought (CoT) prompting methods when solving Math Word Problems (MWPs): 1. **Lack of verification**: Existing methods cannot effectively check whether the generated answers are correct. 2. **Lack of correction**: Even if the answer is found to be wrong, there is no effective mechanism to find the correct answer. 3. **Lack of methods for gradually optimizing the reasoning path**: These methods are very sensitive to errors in the reasoning process, and even a small error may lead to the entire problem - solving process going wrong. To solve these problems, the author proposes a new method - Progressive Rectification Prompting (PRP). PRP gradually identifies and corrects the wrong answers generated by large - language models (LLMs) through the iterative "verification - correction" process, thereby improving the accuracy of problem - solving. Specifically, the workflow of PRP is as follows: 1. **Initialization**: Given an initial answer. 2. **Verification module**: Use alternative verification methods to check the correctness of the answer. If the answer is incorrect, add it to the set of potential wrong answers. 3. **Correction module**: Use the set of potential wrong answers as feedback to generate a corrected answer. 4. **Repeat the above steps** until the maximum number of iterations is reached or the correct answer is found. Through this method, PRP can significantly improve the average accuracy rate on eight MWP datasets, from 77.3% to 90.5%, and outperforms the existing zero - sample and few - sample prompting methods on all datasets. ### Formula representation To ensure the correctness and readability of the formulas, the formulas involved in the paper are represented in Markdown format as follows: - **Initial answer**: \( a_0^{(gen)} \) - **Verification question**: \( q_i^{(veri)} \) - **Verification answer**: \( a_i^{(veri)} \) - **Mask condition**: \( v_i \) - **Set of potential wrong answers**: \( C_i \) For example, in the verification module, the process of verifying whether the verification answer \( a_i^{(veri)} \) is equal to the mask condition \( v_i \) can be represented as: \[ a_i^{(veri)} = v_i \] If the equation holds, the previously generated answer \( a_{i - 1}^{(gen)} \) is considered correct; otherwise, it is added to the set of potential wrong answers \( C_i \). ### Summary PRP overcomes the limitations of existing CoT methods by introducing verification and correction mechanisms and significantly improves the accuracy rate of solving math word problems.

Get an A in Math: Progressive Rectification Prompting

Enhancing Mathematical Reasoning in LLMs by Stepwise Correction

LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought

DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction

Progressive-Hint Prompting Improves Reasoning in Large Language Models

MathPrompter: Mathematical Reasoning using Large Language Models

Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models

From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems

Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning

Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models

Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning

Can We Verify Step by Step for Incorrect Answer Detection?

Learning by Analogy: Enhancing Few-Shot Prompting for Math Word Problem Solving with Computational Graph-Based Retrieval

Instructing Large Language Models to Identify and Ignore Irrelevant Conditions

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

LPML: LLM-Prompting Markup Language for Mathematical Reasoning

RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought

S$^3$c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners

Solving Math Word Problems Via Cooperative Reasoning Induced Language Models