Abstract:The advancement of large language models (LLMs) for real-world applications hinges critically on enhancing their reasoning capabilities. In this work, we explore the reasoning abilities of large language models (LLMs) through their geometrical understanding. We establish a connection between the expressive power of LLMs and the density of their self-attention graphs. Our analysis demonstrates that the density of these graphs defines the intrinsic dimension of the inputs to the MLP blocks. We demonstrate through theoretical analysis and toy examples that a higher intrinsic dimension implies a greater expressive capacity of the LLM. We further provide empirical evidence linking this geometric framework to recent advancements in methods aimed at enhancing the reasoning capabilities of LLMs.

What problem does this paper attempt to address?

The paper attempts to address the problem of improving the reasoning capabilities of large language models (LLMs) in practical applications. Specifically, the authors explore the reasoning capabilities of LLMs through their geometric understanding and establish a connection between model expressiveness and the density of self-attention graphs. They argue that the density of these graphs defines the intrinsic dimension of the inputs to the multi-layer perceptron (MLP) blocks, and a higher intrinsic dimension implies stronger expressiveness. Additionally, they provide theoretical analysis and experimental results demonstrating the association between the geometric framework and recent methods for enhancing LLMs' reasoning capabilities. ### Main Contributions of the Paper 1. **Establishing a Geometric Perspective**: The authors investigate the reasoning capabilities of LLMs from a geometric perspective, establishing a relationship between model expressiveness and the density of self-attention graphs. 2. **Theoretical Analysis and Experimental Validation**: Through theoretical analysis and experimental results, they demonstrate how the density of self-attention graphs affects the expressiveness of MLPs. 3. **Impact of Intrinsic Dimension**: The study shows that increasing the input sequence length and the number of attention heads can enhance the density of self-attention graphs, thereby improving the model's reasoning capabilities. 4. **Experimental Design**: A series of experiments validate the correlation between geometric properties and LLMs' reasoning capabilities, particularly showing that as the number of examples in the prompt increases, the model's intrinsic dimension also increases, ultimately enhancing reasoning performance. ### Key Findings - **Density of Self-Attention Graphs**: The density of self-attention graphs defines the intrinsic dimension of the inputs to the MLP blocks, with higher density implying stronger expressiveness. - **Intrinsic Dimension and Reasoning Capability**: Increasing the model's intrinsic dimension can significantly enhance its reasoning capabilities, especially in the final layer of the model. - **Experimental Validation**: Experiments on the GSM8K-Zero dataset confirm that increasing the intrinsic dimension indeed improves the model's accuracy in providing correct answers. ### Experimental Methods - **Dataset**: The GSM8K-Zero dataset is used to evaluate the model's performance in different few-shot scenarios. - **Experimental Setup**: By increasing prefix tokens and random tokens, the changes in the model's intrinsic dimension and their impact on reasoning capabilities are observed. - **Results Analysis**: The results show that increasing the intrinsic dimension, particularly in the final layer of the model, can significantly improve the model's reasoning performance. ### Conclusion The paper delves into the reasoning capabilities of LLMs from a geometric perspective, proposing a method to increase the model's intrinsic dimension, thereby providing new insights and experimental evidence for enhancing LLMs' reasoning capabilities. These findings not only deepen the understanding of the internal mechanisms of LLMs but also provide theoretical support for further optimization and improvement of LLMs.

Reasoning in Large Language Models: A Geometric Perspective

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models

Towards Reasoning in Large Language Models: A Survey

Can Large Language Models Reason? A Characterization via 3-SAT

Can Large Language Models Act as Symbolic Reasoners?

Large Language Models Are In-Context Semantic Reasoners Rather Than Symbolic Reasoners

Reasoning with Large Language Models, a Survey

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning

Reasoning or a Semblance of it? A Diagnostic Study of Transitive Reasoning in LLMs

Large Language Models Are Not Strong Abstract Reasoners

Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications

GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach

Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data

Can Large Language Models put 2 and 2 together? Probing for Entailed Arithmetical Relationships

Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus

Can Large Language Models Reason about the Region Connection Calculus?

A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences