Abstract:Math reasoning is a highly active area of Large Language Model (LLM) research because it is a hallmark of artificial intelligence. However, few works have explored how math reasoning is encoded within LLM parameters and if it is a skill that can be isolated within a model. Doing so could allow targeted intervention to improve math performance without altering non-math behavior and foster understanding of how models encode math reasoning. We introduce Math Neurosurgery (MathNeuro), a method for isolating math-specific parameters in LLMs using only forward passes. MathNeuro builds on existing work by using weights and activations to calculate parameter importance, but isolates math-specific parameters by removing those important for general language tasks. Pruning parameters MathNeuro identifies deletes a LLM's math reasoning ability without destroying its general language ability. Scaling these parameters by a small constant improves a pretrained or instruction-tuned LLM's performance by 4-17% on GSM8K while leaving non-math behavior unaltered. MathNeuro is also data efficient: most of its effectiveness holds when identifying math-specific parameters using a single sample. MathNeuro highlights the potential for future work to intervene on math-specific parameters.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how the mathematical reasoning ability in large - language models (LLMs) is encoded in the model parameters and whether this ability can be separated from other language tasks. Specifically, the author proposes a method named "MathNeuro", aiming to isolate the math - specific parameters in LLMs using only forward propagation. This method can help: 1. **Targeted Intervention**: By identifying and adjusting the parameters related to mathematical reasoning, improve the model's performance on mathematical tasks without affecting its performance on other non - mathematical tasks. 2. **Understand the Internal Mechanisms of the Model**: Gain in - depth understanding of how LLMs encode mathematical reasoning ability, thus providing a basis for further research and optimization. 3. **Data Efficiency**: Prove that even with a small number of samples, it is possible to effectively identify math - specific parameters, which is of great significance in practical applications. ### Main Contributions - **Designed MathNeuro**: This is the first comprehensively evaluated parameter identification method specifically designed to isolate the mathematical reasoning ability in LLMs. - **Verified the Effectiveness of the Method**: By deleting the parameters identified by MathNeuro, the importance of these parameters for mathematical reasoning ability was proven; at the same time, by amplifying these parameters, the performance of multiple models of different scales on the GSM8K dataset was improved by 4 - 17%. - **Verified the Impact on Non - mathematical Tasks**: By pruning or amplifying these parameters, it was found that there was little impact on the performance of non - mathematical tasks, similar to the effect of random perturbations. ### Method Overview The core steps of the MathNeuro method are as follows: 1. **Identify Important Parameters**: Calculate the importance score of each parameter using data from mathematical tasks and non - mathematical tasks respectively. 2. **Isolate Math - specific Parameters**: From the most important parameters identified in the mathematical tasks, remove those that are also important in non - mathematical tasks, thus obtaining math - specific parameters. ### Experimental Results - **Pruning Experiment**: After deleting the parameters identified by MathNeuro, the model's mathematical reasoning ability decreased significantly, while the performance of non - mathematical tasks decreased to a lesser extent, close to the effect of random pruning. - **Amplification Experiment**: After amplifying the parameters identified by MathNeuro, the model's performance on the GSM8K dataset was improved by 4 - 17%, while the impact on the performance of non - mathematical tasks was small. - **Single - sample Experiment**: Even with only one sample, MathNeuro can still effectively identify math - specific parameters, although the effect is slightly worse than when using more samples. - **Parameter Consistency**: Through multiple experiments, it was found that MathNeuro can consistently identify the same math - specific parameters, and these parameters are relatively evenly distributed in different decoder blocks of the model. ### Conclusion The MathNeuro method can not only effectively isolate the math - specific parameters in LLMs, but also provides new ideas for future model optimization and research. In this way, researchers can more accurately intervene in the specific abilities of the model without affecting its overall performance.

Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes

Neuro-Symbolic Data Generation for Math Reasoning

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

Learning Non-linguistic Skills without Sacrificing Linguistic Proficiency

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics

Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From A Psychological Perspective

Reasoning in Large Language Models Through Symbolic Math Word Problems

INC-Math: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models

Interpreting and Improving Large Language Models in Arithmetic Calculation

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

MathPrompter: Mathematical Reasoning using Large Language Models

Benchmarking Large Language Models for Math Reasoning Tasks

MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline

Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems

MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems

Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations