Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis

Zeping Yu,Sophia Ananiadou

2024-09-21

Abstract:We find arithmetic ability resides within a limited number of attention heads, with each head specializing in distinct operations. To delve into the reason, we introduce the Comparative Neuron Analysis (CNA) method, which identifies an internal logic chain consisting of four distinct stages from input to prediction: feature enhancing with shallow FFN neurons, feature transferring by shallow attention layers, feature predicting by arithmetic heads, and prediction enhancing among deep FFN neurons. Moreover, we identify the human-interpretable FFN neurons within both feature-enhancing and feature-predicting stages. These findings lead us to investigate the mechanism of LoRA, revealing that it enhances prediction probabilities by amplifying the coefficient scores of FFN neurons related to predictions. Finally, we apply our method in model pruning for arithmetic tasks and model editing for reducing gender bias. Code is on <a class="link-external link-https" href="https://github.com/zepingyu0512/arithmetic-mechanism" rel="external noopener nofollow">this https URL</a>.

Computation and Language

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of understanding the mechanisms of large language models (LLMs) in arithmetic tasks. Specifically: 1. **Identifying Key Parameters**: Researchers have found that only a few attention heads significantly impact arithmetic performance and attempt to explore the specific working mechanisms of these heads and how they affect the feedforward neural network (FFN) layers. 2. **Constructing Internal Logical Chains**: The paper proposes the Comparative Neuron Analysis (CNA) method, which identifies four stages from input to prediction: feature enhancement, feature transmission, feature prediction, and prediction enhancement. 3. **Explaining the LoRA Mechanism**: Using the CNA method, the paper explores the working principle of LoRA, discovering that LoRA improves prediction probability by amplifying the coefficient scores of FFN neurons related to the final prediction. 4. **Application and Optimization**: Based on the above findings, the paper designs a model pruning method for arithmetic tasks and a model editing method to reduce gender bias. Overall, this paper is dedicated to revealing the internal mechanisms of large language models in handling arithmetic tasks through detailed experiments and analyses, and proposes improvement methods to optimize model performance.

Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis

Interpreting and Improving Large Language Models in Arithmetic Calculation

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics

Uncovering the Interpretation of Large Language Models

Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures

An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs

Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks

Arithmetic with language models: From memorization to computation

Language Models are Symbolic Learners in Arithmetic

Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines

OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

How well do Large Language Models perform in Arithmetic tasks?

Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice

Reverse That Number! Decoding Order Matters in Arithmetic Learning

Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks

RevOrder: A Novel Method for Enhanced Arithmetic in Language Models

Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology

Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback

Arithmetic Reasoning with LLM: Prolog Generation & Permutation

Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From A Psychological Perspective