Abstract:Retrieving external knowledge and prompting large language models with relevant information is an effective paradigm to enhance the performance of question-answering tasks. Previous research typically handles paragraphs from external documents in isolation, resulting in a lack of context and ambiguous references, particularly in multi-document and complex tasks. To overcome these challenges, we propose a new retrieval framework IIER, that leverages Inter-chunk Interactions to Enhance Retrieval. This framework captures the internal connections between document chunks by considering three types of interactions: structural, keyword, and semantic. We then construct a unified Chunk-Interaction Graph to represent all external documents comprehensively. Additionally, we design a graph-based evidence chain retriever that utilizes previous paths and chunk interactions to guide the retrieval process. It identifies multiple seed nodes based on the target question and iteratively searches for relevant chunks to gather supporting evidence. This retrieval process refines the context and reasoning chain, aiding the large language model in reasoning and answer generation. Extensive experiments demonstrate that IIER outperforms strong baselines across four datasets, highlighting its effectiveness in improving retrieval and reasoning capabilities.

What problem does this paper attempt to address?

This paper attempts to address the problem of how to effectively retrieve external knowledge and generate accurate answers in multi-document question answering (MDQA) within large language models (LLMs). Specifically, existing methods often treat paragraphs from external documents in isolation, leading to a lack of contextual information and ambiguous references, especially when dealing with complex tasks. To overcome these challenges, the paper proposes a new retrieval framework—IIER (Inter-chunk Interactions to Enhance Retrieval), which enhances retrieval effectiveness by leveraging interactions between document chunks. ### Main Issues 1. **Lack of Contextual Information**: Existing methods typically treat each paragraph as an independent unit when processing external documents, ignoring the relationships between paragraphs, resulting in insufficient contextual information. 2. **Ambiguous References**: Due to the lack of context, pronouns and ambiguous expressions in paragraphs may affect the correct understanding of their original semantics. 3. **Complex Multi-document Tasks**: Accurately understanding and covering all supporting evidence remains an open problem when dealing with multi-document and complex tasks. ### Solution The paper proposes the IIER framework to address the above issues through the following points: 1. **Constructing a Chunk-Interaction Graph (CIG)**: All documents are divided into chunks, with each chunk serving as a node in the graph, and these nodes are connected through three types of interactions (structural, keyword, and semantic). 2. **Graph-based Evidence Chain Retriever**: A graph-based evidence chain retriever is designed to guide the retrieval process using previous path information and interactions between chunks, gradually approaching the supporting evidence. 3. **Multi-path Retrieval Strategy**: A multi-path retrieval strategy is employed, starting from multiple seed nodes to gradually build a complete reasoning chain, assisting the LLM in reasoning and answer generation. ### Experimental Results Experimental results show that IIER significantly outperforms existing baseline methods across four datasets, particularly in tasks requiring strict multi-hop reasoning and high retrieval precision. Specifically: - On the 2WikiMQA dataset, IIER's accuracy improved by 15% over baseline methods. - On the HotpotQA dataset, accuracy improved by 6.0%. - On the IIRC and MuSiQue datasets, accuracy improved by 3.5% and 2.0%, respectively. ### Conclusion By leveraging various interactions between document chunks, IIER can more comprehensively capture potential semantic and logical connections, achieving more accurate retrieval and reasoning capabilities in multi-document question answering tasks. This approach not only improves retrieval efficiency but also reduces transfer overhead, providing strong support for the application of large language models in complex tasks.

Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Retrieval-Generation Synergy Augmented Large Language Models

Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering

Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding

Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering

Retrieval and Reasoning on KGs: Integrate Knowledge Graphs into Large Language Models for Complex Question Answering

ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering

Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding

Bridging the Preference Gap between Retrievers and LLMs

Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models

Reimagining Retrieval Augmented Language Models for Answering Queries

Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities

Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering

EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs

Retrieval-based Video Language Model for Efficient Long Video Question Answering

Graph Neural Network Enhanced Retrieval for Question Answering of LLMs

Understanding Retrieval Augmentation for Long-Form Question Answering