Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

Yifan Hou,Jiaoda Li,Yu Fei,Alessandro Stolfo,Wangchunshu Zhou,Guangtao Zeng,Antoine Bosselut,Mrinmaya Sachan
DOI: https://doi.org/10.48550/arXiv.2310.14491
2023-10-23
Abstract:Recent work has shown that language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. However, it is unclear whether LMs perform these tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. In this paper, we try to answer this question by exploring a mechanistic interpretation of LMs for multi-step reasoning tasks. Concretely, we hypothesize that the LM implicitly embeds a reasoning tree resembling the correct reasoning process within it. We test this hypothesis by introducing a new probing approach (called MechanisticProbe) that recovers the reasoning tree from the model's attention patterns. We use our probe to analyze two LMs: GPT-2 on a synthetic task (k-th smallest element), and LLaMA on two simple language-based reasoning tasks (ProofWriter & AI2 Reasoning Challenge). We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples, suggesting that the LM indeed is going through a process of multi-step reasoning within its architecture in many cases.
Computation and Language
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper attempts to answer whether language models (LMs) are "cheating" by memorizing answers from pre-training corpora when performing multi-step reasoning tasks, or if they are solving problems through genuine multi-step reasoning mechanisms. Specifically, the authors propose a hypothesis that language models implicitly embed a reasoning tree similar to the correct reasoning process within their internal structure and design a new probing method (called MechanisticProbe) to recover this reasoning tree from the model's attention patterns. ### Main Research Content 1. **Hypothesis and Method**: - **Hypothesis**: Language models implicitly construct a reasoning tree when answering multi-step reasoning questions. - **Method**: Introduce MechanisticProbe to recover the reasoning tree by analyzing the model's attention patterns. 2. **Experimental Setup**: - Conduct experiments using GPT-2 on a synthetic task (finding the k-th smallest element in a sequence). - Conduct experiments using LLaMA on two natural language reasoning tasks (ProofWriter and AI2 Reasoning Challenge). 3. **Experimental Results**: - **GPT-2**: For most examples, MechanisticProbe can detect information about the reasoning tree from GPT-2's attention patterns, indicating that GPT-2 indeed performs multi-step reasoning within its architecture. - **LLaMA**: For the ProofWriter and ARC tasks, MechanisticProbe can also detect information about the reasoning tree, particularly excelling in selecting useful statements and determining reasoning steps. 4. **Further Validation**: - Validate the importance of these attention heads for reasoning tasks by observing the performance drop when pruning the attention heads identified by MechanisticProbe. - Analyze the correlation between probe scores and model performance and robustness, finding that examples with high probe scores have better prediction accuracy and noise resistance. ### Conclusion The paper, through the MechanisticProbe method, provides evidence that language models solve multi-step reasoning tasks through internal multi-step reasoning mechanisms rather than merely relying on memorizing answers from pre-training corpora. This finding is significant for understanding the working principles of language models and developing the next generation of reliable language-based reasoners.