Abstract:Reasoning ability is one of the most crucial capabilities of a foundation model, signifying its capacity to address complex reasoning tasks. Chain-of-Thought (CoT) technique is widely regarded as one of the effective methods for enhancing the reasoning ability of foundation models and has garnered significant attention. However, the reasoning process of CoT is linear, step-by-step, similar to personal logical reasoning, suitable for solving general and slightly complicated problems. On the contrary, the thinking pattern of an expert owns two prominent characteristics that cannot be handled appropriately in CoT, i.e., high-order multi-hop reasoning and multimodal comparative judgement. Therefore, the core motivation of this paper is transcending CoT to construct a reasoning paradigm that can think like an expert. The hyperedge of a hypergraph could connect various vertices, making it naturally suitable for modelling high-order relationships. Inspired by this, this paper innovatively proposes a multimodal Hypergraph-of-Thought (HoT) reasoning paradigm, which enables the foundation models to possess the expert-level ability of high-order multi-hop reasoning and multimodal comparative judgement. Specifically, a textual hypergraph-of-thought is constructed utilizing triple as the primary thought to model higher-order relationships, and a hyperedge-of-thought is generated through multi-hop walking paths to achieve multi-hop inference. Furthermore, we devise a visual hypergraph-of-thought to interact with the textual hypergraph-of-thought via Cross-modal Co-Attention Graph Learning for multimodal comparative verification. Experimentations on the ScienceQA benchmark demonstrate the proposed HoT-based T5 outperforms CoT-based GPT3.5 and chatGPT, which is on par with CoT-based GPT4 with a lower model size.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the limitations of foundational models in complex reasoning tasks. Specifically, while existing Chain-of-Thought (CoT) techniques can enhance the reasoning capabilities of foundational models, their linear, step-by-step reasoning process is ineffective in handling expert-level high-order multi-hop reasoning and multimodal comparative judgments. Therefore, the core motivation of the paper is to go beyond CoT and construct a reasoning paradigm that can think like an expert. ### Specific Problems 1. **Limitations of Linear Reasoning**: The reasoning process of CoT is linear, making it difficult to achieve leapfrog logical reasoning. 2. **Insufficiency of Multi-step Reasoning**: CoT decomposes problems into sequential steps, lacking sufficient consideration for concurrent steps and conflicts. 3. **Limitations in Non-specialized Problems**: Existing CoT methods are mainly suitable for solving general and slightly complex problems but perform poorly in solving specialized problems. ### Solution To overcome these limitations, the paper proposes a Hypergraph-of-Thought (HoT) reasoning paradigm, which includes the following aspects: 1. **Textual Hypergraph Thinking**: Using triplets as basic units to model high-order relationships and generating long-distance reasoning paths through multi-hop random walks to form hyperedges, thereby achieving multi-hop reasoning capabilities. 2. **Visual Hypergraph Thinking**: Constructing hyperedges by performing k-means clustering on image patches to form a visual hypergraph. 3. **Cross-modal Interaction**: Implementing a Cross-modal Co-Attention Graph Learning module to achieve interaction between textual and visual hypergraphs, avoiding conflicts of information from different modalities. ### Experimental Results Experimental results show that the T5 model based on HoT significantly outperforms the CoT-based GPT-3.5 and ChatGPT on the ScienceQA benchmark dataset and is comparable to the CoT-based GPT-4 with a smaller model size. This demonstrates the effectiveness of the HoT paradigm in enhancing the high-order multi-hop reasoning capabilities and multimodal comparative judgment abilities of foundational models.

Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning to boost Foundation Modals

Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models

Understanding Reasoning in Chain-of-Thought from the Hopfieldian View

A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity

Multimodal Chain-of-Thought Reasoning in Language Models

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models

Markov Chain of Thought for Efficient Mathematical Reasoning

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models

Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding

Design of Chain-of-Thought in Math Problem Solving

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration

Enhancing human-like multimodal reasoning: a new challenging dataset and comprehensive framework

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

Supervised Chain of Thought

Synergy-of-Thoughts: Eliciting Efficient Reasoning in Hybrid Language Models