Get an A in Math: Progressive Rectification Prompting

Zhenyu Wu,Meng Jiang,Chao Shen
2023-12-12
Abstract:Chain-of-Thought (CoT) prompting methods have enabled large language models (LLMs) to generate reasoning paths and solve math word problems (MWPs). However, they are sensitive to mistakes in the paths, as any mistake can result in an incorrect answer. We propose a novel method named Progressive Rectification Prompting (PRP) to improve average accuracy on eight MWP datasets from 77.3 to 90.5. Given an initial answer from CoT, PRP iterates a verify-then-rectify process to progressively identify incorrect answers and rectify the reasoning paths. With the most likely correct answer, the LLM predicts a masked numerical value in the question; if the prediction does not match the masked value, the answer is likely incorrect. Then the LLM is prompted to re-generate the reasoning path hinted with a set of incorrect answers to prevent itself from repeating previous mistakes. PRP achieves the best performance compared against the CoT methods. Our implementation is made publicly available at https://wzy6642.github.io/prp.github.io/.
Computation and Language
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the three main drawbacks of the existing Chain - of - Thought (CoT) prompting methods when solving Math Word Problems (MWPs): 1. **Lack of verification**: Existing methods cannot effectively check whether the generated answers are correct. 2. **Lack of correction**: Even if the answer is found to be wrong, there is no effective mechanism to find the correct answer. 3. **Lack of methods for gradually optimizing the reasoning path**: These methods are very sensitive to errors in the reasoning process, and even a small error may lead to the entire problem - solving process going wrong. To solve these problems, the author proposes a new method - Progressive Rectification Prompting (PRP). PRP gradually identifies and corrects the wrong answers generated by large - language models (LLMs) through the iterative "verification - correction" process, thereby improving the accuracy of problem - solving. Specifically, the workflow of PRP is as follows: 1. **Initialization**: Given an initial answer. 2. **Verification module**: Use alternative verification methods to check the correctness of the answer. If the answer is incorrect, add it to the set of potential wrong answers. 3. **Correction module**: Use the set of potential wrong answers as feedback to generate a corrected answer. 4. **Repeat the above steps** until the maximum number of iterations is reached or the correct answer is found. Through this method, PRP can significantly improve the average accuracy rate on eight MWP datasets, from 77.3% to 90.5%, and outperforms the existing zero - sample and few - sample prompting methods on all datasets. ### Formula representation To ensure the correctness and readability of the formulas, the formulas involved in the paper are represented in Markdown format as follows: - **Initial answer**: \( a_0^{(gen)} \) - **Verification question**: \( q_i^{(veri)} \) - **Verification answer**: \( a_i^{(veri)} \) - **Mask condition**: \( v_i \) - **Set of potential wrong answers**: \( C_i \) For example, in the verification module, the process of verifying whether the verification answer \( a_i^{(veri)} \) is equal to the mask condition \( v_i \) can be represented as: \[ a_i^{(veri)} = v_i \] If the equation holds, the previously generated answer \( a_{i - 1}^{(gen)} \) is considered correct; otherwise, it is added to the set of potential wrong answers \( C_i \). ### Summary PRP overcomes the limitations of existing CoT methods by introducing verification and correction mechanisms and significantly improves the accuracy rate of solving math word problems.