Abstract:Recent work has shown that language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. However, it is unclear whether LMs perform these tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. In this paper, we try to answer this question by exploring a mechanistic interpretation of LMs for multi-step reasoning tasks. Concretely, we hypothesize that the LM implicitly embeds a reasoning tree resembling the correct reasoning process within it. We test this hypothesis by introducing a new probing approach (called MechanisticProbe) that recovers the reasoning tree from the model's attention patterns. We use our probe to analyze two LMs: GPT-2 on a synthetic task (k-th smallest element), and LLaMA on two simple language-based reasoning tasks (ProofWriter & AI2 Reasoning Challenge). We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples, suggesting that the LM indeed is going through a process of multi-step reasoning within its architecture in many cases.

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve This paper attempts to answer whether language models (LMs) are "cheating" by memorizing answers from pre-training corpora when performing multi-step reasoning tasks, or if they are solving problems through genuine multi-step reasoning mechanisms. Specifically, the authors propose a hypothesis that language models implicitly embed a reasoning tree similar to the correct reasoning process within their internal structure and design a new probing method (called MechanisticProbe) to recover this reasoning tree from the model's attention patterns. ### Main Research Content 1. **Hypothesis and Method**: - **Hypothesis**: Language models implicitly construct a reasoning tree when answering multi-step reasoning questions. - **Method**: Introduce MechanisticProbe to recover the reasoning tree by analyzing the model's attention patterns. 2. **Experimental Setup**: - Conduct experiments using GPT-2 on a synthetic task (finding the k-th smallest element in a sequence). - Conduct experiments using LLaMA on two natural language reasoning tasks (ProofWriter and AI2 Reasoning Challenge). 3. **Experimental Results**: - **GPT-2**: For most examples, MechanisticProbe can detect information about the reasoning tree from GPT-2's attention patterns, indicating that GPT-2 indeed performs multi-step reasoning within its architecture. - **LLaMA**: For the ProofWriter and ARC tasks, MechanisticProbe can also detect information about the reasoning tree, particularly excelling in selecting useful statements and determining reasoning steps. 4. **Further Validation**: - Validate the importance of these attention heads for reasoning tasks by observing the performance drop when pruning the attention heads identified by MechanisticProbe. - Analyze the correlation between probe scores and model performance and robustness, finding that examples with high probe scores have better prediction accuracy and noise resistance. ### Conclusion The paper, through the MechanisticProbe method, provides evidence that language models solve multi-step reasoning tasks through internal multi-step reasoning mechanisms rather than merely relying on memorizing answers from pre-training corpora. This finding is significant for understanding the working principles of language models and developing the next generation of reliable language-based reasoners.

Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

Exploring the Role of Reasoning Structures for Constructing Proofs in Multi-Step Natural Language Reasoning with Large Language Models

Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning

Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance

How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning

From Complex to Simple: Unraveling the Cognitive Tree for Reasoning with Small Language Models

Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers?

Multimodal Chain-of-Thought Reasoning in Language Models

Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Reasoning

Learning Multi-Step Reasoning by Solving Arithmetic Tasks

Probing Mechanical Reasoning in Large Vision Language Models

MindMap: Constructing Evidence Chains for Multi-Step Reasoning in Large Language Models

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Do Large Language Models Latently Perform Multi-Hop Reasoning?

Reasoning with Large Language Models, a Survey

Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought

Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models

Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation

Break the Chain: Large Language Models Can be Shortcut Reasoners