Abstract:Despite the advances in large language models (LLMs), how they use their knowledge for reasoning is not yet well understood. In this study, we propose a method that deconstructs complex real-world questions into a graph, representing each question as a node with predecessors of background knowledge needed to solve the question. We develop the DepthQA dataset, deconstructing questions into three depths: (i) recalling conceptual knowledge, (ii) applying procedural knowledge, and (iii) analyzing strategic knowledge. Based on a hierarchical graph, we quantify forward discrepancy, a discrepancy in LLM performance on simpler sub-problems versus complex questions. We also measure backward discrepancy where LLMs answer complex questions but struggle with simpler ones. Our analysis shows that smaller models exhibit more discrepancies than larger models. Distinct patterns of discrepancies are observed across model capacity and possibility of training data memorization. Additionally, guiding models from simpler to complex questions through multi-turn interactions improves performance across model sizes, highlighting the importance of structured intermediate steps in knowledge reasoning. This work enhances our understanding of LLM reasoning and suggests ways to improve their problem-solving abilities.

What problem does this paper attempt to address?

The paper attempts to address the issue of how to better understand the knowledge utilization and reasoning capabilities of large language models (LLMs) when solving complex problems. Despite significant progress made by LLMs in many tasks, how they leverage knowledge for reasoning remains unclear. To this end, the authors propose a method that decomposes complex real-world problems into a graph structure, where each node represents a problem and the prerequisite knowledge needed to solve the problem is represented by predecessor nodes. In this way, the authors hope to quantify the performance differences of LLMs between simple subproblems and complex problems, thereby gaining a better understanding of these models' reasoning abilities. Specifically, the paper addresses the following key issues: 1. **How to decompose complex problems into a graph structure**: The authors developed a dataset called DEPTH QA, which decomposes problems into three levels: conceptual knowledge (D1), procedural knowledge (D2), and strategic knowledge (D3). Problems at each level are connected through a graph structure, forming a hierarchical knowledge graph. 2. **How to quantify the reasoning gap of LLMs**: Based on the hierarchical graph structure, the authors defined forward discrepancy and backward discrepancy to measure the performance differences of LLMs between simple subproblems and complex problems, and between complex problems and simple subproblems, respectively. 3. **How different scales of LLMs perform in reasoning ability**: Through experiments, the authors compared the performance of LLMs of different scales in handling problems at different levels and found that smaller models exhibit greater inconsistency in forward and backward discrepancies. 4. **How to improve LLMs' reasoning ability through multiple rounds of interaction**: The authors found that gradually guiding the model from simple problems to complex problems can significantly enhance its reasoning ability, highlighting the importance of structured intermediate steps in knowledge reasoning. In summary, this paper aims to deeply analyze and evaluate the knowledge utilization and reasoning capabilities of LLMs in solving complex problems through graph structures and hierarchical decomposition methods, providing new insights and methods for improving these models.

Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering

A Hierarchical Language Model For Interpretable Graph Reasoning

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

Step by Step: A Hierarchical Framework for Multi-Hop Knowledge Graph Reasoning with Reinforcement Learning

ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained Language Models for Question Answering over Knowledge Graph

Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models

Think-on-Graph: Deep and Responsible Reasoning of Large Language Model with Knowledge Graph

Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph

An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration

Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed Chains

Reason more like human: Incorporating meta information into hierarchical reinforcement learning for knowledge graph reasoning

Retrieval and Reasoning on KGs: Integrate Knowledge Graphs into Large Language Models for Complex Question Answering

On Exploring the Reasoning Capability of Large Language Models with Knowledge Graphs

KG-GPT: A General Framework for Reasoning on Knowledge Graphs Using Large Language Models

KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph