MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs

Zimu Lu,Aojun Zhou,Houxing Ren,Ke Wang,Weikang Shi,Junting Pan,Mingjie Zhan,Hongsheng Li

2024-09-11

Abstract:Large language models (LLMs) have exhibited great potential in mathematical reasoning. However, there remains a performance gap in this area between existing open-source models and closed-source models such as GPT-4. In this paper, we introduce MathGenie, a novel method for generating diverse and reliable math problems from a small-scale problem-solution dataset (denoted as seed data). We augment the ground-truth solutions of our seed data and train a back-translation model to translate the augmented solutions back into new questions. Subsequently, we generate code-integrated solutions for the new questions. To ensure the correctness of the code-integrated solutions, we employ rationale-based strategy for solution verification. Various pretrained models, ranging from 7B to 70B, are trained on the newly curated data to test the effectiveness of the proposed augmentation technique, resulting in a family of models known as MathGenieLM. These models consistently outperform previous open-source models across five representative mathematical reasoning datasets, achieving state-of-the-art performance. In particular, MathGenieLM-InternLM2 achieves an accuracy of 87.7% on GSM8K and 55.7% on MATH, securing the best overall score among open-source language models.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the performance gap in mathematical reasoning for large language models (LLMs), especially when compared to closed-source models like GPT-4. Specifically, the paper proposes a new method called MathGenie, which is used to generate diverse and reliable math problems from a small-scale problem-solution dataset. By iteratively enhancing solutions, back-translating problems, and filtering solutions based on verification, the quality of the newly generated problems and their code-integrated solutions is ensured. Ultimately, the model trained using this method performs excellently on multiple mathematical reasoning datasets, achieving the best overall scores on the GSM8K and MATH datasets. Additionally, the paper demonstrates the effectiveness of the proposed method on different pre-trained models and validates the importance of each component through ablation experiments.

MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

MathScale: Scaling Instruction Tuning for Mathematical Reasoning

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

Neuro-Symbolic Data Generation for Math Reasoning

AI-Assisted Generation of Difficult Math Questions

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline

ControlMath: Controllable Data Generation Promotes Math Generalist Models

LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data

Augmenting Math Word Problems via Iterative Question Composing

PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation