JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

Kun Zhou,Beichen Zhang,Jiapeng Wang,Zhipeng Chen,Wayne Xin Zhao,Jing Sha,Zhichao Sheng,Shijin Wang,Ji-Rong Wen

2024-05-23

Abstract:Mathematical reasoning is an important capability of large language models~(LLMs) for real-world applications. To enhance this capability, existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs (\eg GPT-4) to synthesize massive math problems. Both types of work generally lead to large costs in training or synthesis. To reduce the cost, based on open-source available texts, we propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data. To achieve it, we create a dataset using GPT-4 to distill its data synthesis capability into the small LLM. Concretely, we craft a set of prompts based on human education stages to guide GPT-4, to synthesize problems covering diverse math knowledge and difficulty levels. Besides, we adopt the gradient-based influence estimation method to select the most valuable math-related texts. The both are fed into GPT-4 for creating the knowledge distillation dataset to train the small LLM. We leverage it to synthesize 6 million math problems for pre-training our JiuZhang3.0 model, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B data. Experimental results have shown that JiuZhang3.0 achieves state-of-the-art performance on several mathematical reasoning datasets, under both natural language reasoning and tool manipulation settings. Our code and data will be publicly released in \url{

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper mainly discusses how to efficiently enhance the mathematical reasoning ability of large-scale language models (LLMs). Existing methods either require a large amount of collected math-related texts for pre-training or rely on larger LLMs like GPT-4 to generate massive math problems. These methods often result in high training or synthesis costs. The paper proposes a new approach to synthesize math problems by training a small-scale LLM, thus generating high-quality pre-training data and reducing costs. Specifically, the researchers first create a dataset using GPT-4 that includes various mathematical knowledge and difficulty levels. Then, they use a gradient similarity-based method to select the most valuable math-related texts. These texts, along with carefully designed prompts based on human education stages, are input into GPT-4 to generate a knowledge distillation dataset for training the small LLM. This small LLM is then used to synthesize approximately 6 million math problems (4.6 billion tokens) for pre-training the JiuZhang3.0 model, with only 9300 calls to the GPT-4 API, greatly reducing costs. Experimental results show that JiuZhang3.0 achieves state-of-the-art performance on multiple mathematical reasoning datasets, whether in natural language reasoning or tool manipulation settings. Furthermore, compared to existing methods, JiuZhang3.0 reduces the overall cost by approximately 20%. In conclusion, the paper proposes an effective approach to enhance the mathematical reasoning ability of LLMs by training a small-scale LLM to synthesize math problems, while significantly reducing costs.

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding

JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs

MathScale: Scaling Instruction Tuning for Mathematical Reasoning

Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline

Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning

Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models