SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

Dian Yu,Baolin Peng,Ye Tian,Linfeng Song,Haitao Mi,Dong Yu

2024-08-28

Abstract:There is a growing trend of teaching large language models (LLMs) to solve mathematical problems through coding. Existing studies primarily focus on prompting powerful, closed-source models to generate seed training data followed by in-domain data augmentation, equipping LLMs with considerable capabilities for code-aided mathematical reasoning. However, continually training these models on augmented data derived from a few datasets such as GSM8K may impair their generalization abilities and restrict their effectiveness to a narrow range of question types. Conversely, the potential of improving such LLMs by leveraging large-scale, expert-written, diverse math question-answer pairs remains unexplored. To utilize these resources and tackle unique challenges such as code response assessment, we propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. We also explore different alignment algorithms with self-generated instruction/preference data to foster continuous improvement. Experiments across both in-domain (up to +5.7%) and out-of-domain (+4.4%) benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.

Computation and Language

What problem does this paper attempt to address?

The paper aims to address the issue of insufficient accuracy encountered by large language models (LLMs) when solving mathematical problems, especially in out-of-domain scenarios. Specifically, existing research mainly focuses on utilizing powerful closed-source models to generate seed training data and enhancing LLMs' code-assisted mathematical reasoning capabilities through in-domain data augmentation. However, this approach may lead to a decline in the model's generalization ability, limiting its effectiveness across various types of problems. The paper proposes a new paradigm that leverages large-scale, expert-written mathematical question-and-answer resources to improve LLMs' mathematical reasoning capabilities through code assistance. This paradigm includes the following key steps: 1. **Constructing the Initial Model**: First, fine-tune the LLM using high-quality seed data to generate the initial model. 2. **Establishing a Multi-purpose Code Evaluation Model**: To better utilize diverse mathematical Q&A data, the paper proposes constructing a code evaluation model to assess the consistency between code execution results and reference answers. 3. **Code Data Generation**: Generate code samples through the current policy model and use the code evaluation model to filter out correct code responses. 4. **Self-Improvement Mechanism**: Use unseen data for supervised fine-tuning (SFT) and further optimize the model through preference learning algorithms (such as DPO). Experimental results show that this paradigm can significantly enhance LLMs' performance on various mathematical benchmark tests, especially in out-of-domain tasks. Additionally, this approach reduces the need for specific language mathematical datasets, offering a certain degree of cross-language generalizability.

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

CoinMath: Harnessing the Power of Coding Instruction for Math LLMs

Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching

SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models

MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning

MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline

S$^3$c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners

Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems

INC-Math: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models

LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs

Solving Math Word Problems by Combining Language Models With Symbolic Solvers

AI-Assisted Generation of Difficult Math Questions

Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction

S^3c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners

MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data