Abstract:The rapid progress in the field of natural language processing (NLP) systems and the expansion of large language models (LLMs) have opened up numerous opportunities in the field of education and instructional methods. These advancements offer the potential for tailored learning experiences and immediate feedback, all delivered through accessible and cost-effective services. One notable application area for this technological advancement is in the realm of solving mathematical problems. Mathematical problem-solving not only requires the ability to decipher complex problem statements but also the skill to perform precise arithmetic calculations at each step of the problem-solving process. However, the evaluation of the arithmetic capabilities of large language models remains an area that has received relatively little attention. In response, we introduce an extensive mathematics dataset called "MathQuest" sourced from the 11th and 12th standard Mathematics NCERT textbooks. This dataset encompasses mathematical challenges of varying complexity and covers a wide range of mathematical concepts. Utilizing this dataset, we conduct fine-tuning experiments with three prominent LLMs: LLaMA-2, WizardMath, and MAmmoTH. These fine-tuned models serve as benchmarks for evaluating their performance on our dataset. Our experiments reveal that among the three models, MAmmoTH-13B emerges as the most proficient, achieving the highest level of competence in solving the presented mathematical problems. Consequently, MAmmoTH-13B establishes itself as a robust and dependable benchmark for addressing NCERT mathematics problems.

SkyMath: Technical Report

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

KwaiYiiMath: Technical Report

Skywork: A More Open Bilingual Foundation Model

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems

MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

Common 7B Language Models Already Possess Strong Math Capabilities

Llemma: An Open Language Model For Mathematics

Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology

MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From A Psychological Perspective

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Large Language Models for Mathematical Reasoning: Progresses and Challenges

MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model

Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems

Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks