Abstract:Multi-hop reasoning, which requires multi-step reasoning based on the supporting documents within a given context, remains challenging for large language models (LLMs). LLMs often struggle to filter out irrelevant documents within the context, and their performance is sensitive to the position of supporting documents within that context. In this paper, we identify an additional challenge: LLMs' performance is also sensitive to the order in which the supporting documents are presented. We refer to this as the misordered context problem. To address this issue, we propose a simple yet effective method called context repetition (CoRe), which involves prompting the model by repeatedly presenting the context to ensure the supporting documents are presented in the optimal order for the model. Using CoRe, we improve the F1 score by up to 30%p on multi-hop QA tasks and increase accuracy by up to 70%p on a synthetic task. Additionally, CoRe helps mitigate the well-known "lost-in-the-middle" problem in LLMs and can be effectively combined with retrieval-based approaches utilizing Chain-of-Thought (CoT) reasoning.

What problem does this paper attempt to address?

This paper attempts to address several key challenges faced by large language models (LLMs) in multi-hop reasoning tasks. Specifically: 1. **Order Sensitivity of Supporting Documents**: The performance of LLMs is highly sensitive to the presentation order of supporting documents. If the order of the supporting documents is not conducive to the model's reasoning process, the model's performance may significantly decline. The authors refer to this issue as the "misordered context problem." 2. **Lost-in-the-Middle Problem**: When supporting documents are located in the middle of the context, the model may fail to correctly identify these documents, leading to a decline in reasoning performance. 3. **Interference from Noisy Documents**: LLMs struggle to filter out noisy documents that are irrelevant to the correct answer, which can severely impact the model's reasoning performance. To address these issues, the authors propose a simple yet effective method called "Context Repetition" (CoRe). By repeatedly presenting the context multiple times, they ensure that the supporting documents are presented to the model in the optimal order, thereby improving the model's performance in multi-hop reasoning tasks. ### Main Contributions 1. **Introduction of the Misordered Context Problem**: Revealed the impact of the order of supporting documents on the performance of LLMs and defined it as a key challenge. 2. **Theoretical Analysis and Method Proposal**: Theoretically proposed a method to solve this problem through context enhancement and introduced the CoRe method. 3. **Experimental Validation**: Conducted experiments on multiple multi-hop QA benchmark datasets and a synthetic task, demonstrating the effectiveness of the CoRe method. ### Experimental Results - On multi-hop QA datasets such as HotpotQA, 2WikiMultihopQA, and MuSiQue, the CoRe method significantly improved the model's F1 score, with an increase of up to 30%. - In the synthetic task, the CoRe method significantly improved the model's accuracy, with an increase of up to 70%. - The CoRe method not only improved the model's performance in multi-hop reasoning tasks but also alleviated the "Lost-in-the-Middle Problem" and can be combined with retrieval-based methods. ### Conclusion This study effectively addresses the sensitivity of LLMs to the order of supporting documents in multi-hop reasoning tasks by introducing the CoRe method, thereby improving the model's reasoning performance, especially in complex multi-hop reasoning tasks.

Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers?

Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning

Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

Making Long-Context Language Models Better Multi-Hop Reasoners

Multimodal Chain-of-Thought Reasoning in Language Models

CoQ:AN Empirical Framework for Multi-hop Question Answering Empowered by Large Language Models

Hint Marginalization for Improved Reasoning in Large Language Models

Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding

Concise and Organized Perception Facilitates Reasoning in Large Language Models

INFORM : Information Entropy Based Multi-Step Reasoning FOR Large Language Models

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models

Triggering Multi-Hop Reasoning for Question Answering in Language Models using Soft Prompts and Random Walks

Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Information Re-Organization Improves Reasoning in Large Language Models

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Do Large Language Models Latently Perform Multi-Hop Reasoning?

Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning