Abstract:The similarity between the question and indexed documents is a crucial factor in document retrieval for retrieval-augmented question answering. Although this is typically the only method for obtaining the relevant documents, it is not the sole approach when dealing with entity-centric questions. In this study, we propose Entity Retrieval, a novel retrieval method which rather than relying on question-document similarity, depends on the salient entities within the question to identify the retrieval documents. We conduct an in-depth analysis of the performance of both dense and sparse retrieval methods in comparison to Entity Retrieval. Our findings reveal that our method not only leads to more accurate answers to entity-centric questions but also operates more efficiently.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to obtain relevant documents more accurately and efficiently in entity - centric questions to enhance the answers generated by large - language models (LLMs). Specifically, the paper proposes a new retrieval method - **Entity Retrieval**, which identifies the salient entities in the question and looks for documents corresponding to these entities from the knowledge base (such as Wikipedia), instead of relying on the traditional retrieval method based on question - document similarity. ### Main problems 1. **Improving the accuracy of entity - centric questions**: Traditional retrieval methods (such as sparse retrieval BM25, dense retrieval DPR, and ANCE) may retrieve irrelevant documents when dealing with entity - centric questions, thus affecting the accuracy of the answers. The method proposed in the paper aims to improve the accuracy and efficiency of retrieval by using salient entities. 2. **Reducing the impact of irrelevant documents**: The paper points out that the retrieved irrelevant documents will reduce the performance of the system. Therefore, the entity retrieval method improves the overall performance by reducing the number of retrieved documents while ensuring the relevance of these documents. 3. **Improving retrieval efficiency**: Traditional retrieval methods need to store a large number of indexes and load these indexes during inference, which is a challenge in resource - limited environments (such as mobile devices). The entity retrieval method simplifies this process and improves efficiency by directly looking for relevant documents from the knowledge base. ### Method overview - **Entity identification**: First, identify the salient entities in the question. - **Knowledge base lookup**: Look for the corresponding documents from the knowledge base (such as Wikipedia) according to the identified entities. - **Document truncation**: Truncate the retrieved documents to the first W words to form a document set for enhancing the question. - **Enhanced question answering**: Provide the truncated documents as context to the LLM to generate more accurate answers. ### Experimental results - **Performance evaluation**: The paper evaluates the performance of the entity retrieval method through multiple metrics (such as nDCG@k, MRR, and Top - k retrieval accuracy) and compares it with other traditional retrieval methods. - **Experimental data set**: Three data sets, EntityQuestions, FactoidQA, and StrategyQA, are used for the experiment. - **Experimental results**: The experimental results show that the entity retrieval method is superior to traditional retrieval methods in most cases, especially when dealing with entity - centric questions. ### Conclusion The entity retrieval method proposed in the paper not only improves the accuracy of answering entity - centric questions but also significantly improves the retrieval efficiency. By reducing the interference of irrelevant documents, the entity retrieval method can more effectively support LLMs to generate high - quality answers.

Entity Retrieval for Answering Entity-Centric Questions

Entity-Relation Extraction As Multi-Turn Question Answering

Information Retrieval with Entity Linking

Early Stage Sparse Retrieval with Entity Linking

DREQ: Document Re-Ranking Using Entity-based Query Understanding

Features and Aggregators for Web-scale Entity Search

On Type-Aware Entity Retrieval

Towards Self-Contained Answers: Entity-Based Answer Rewriting in Conversational Search

Community Question Answering Entity Linking via Leveraging Auxiliary Data

Retrieval Helps or Hurts? A Deeper Dive into the Efficacy of Retrieval Augmentation to Language Models

Research on Dual-Dimensional Entity Association-Based Question and Answering Technology for Smart Medicine

Identifying and exploiting target entity type information for ad hoc entity retrieval

QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations

Incremental Entity Resolution from Linked Documents

NERank+: a Graph-Based Approach for Entity Ranking in Document Collections.

Cross-modal Retrieval for Knowledge-based Visual Question Answering

Early Fusion Strategy for Entity-Relationship Retrieval

Entity-aware Transformers for Entity Search

Towards Better Text Understanding and Retrieval Through Kernel Entity Salience Modeling.

A Technical Report: Entity Extraction Using Both Character-based and Token-based Similarity

Leveraging Contextual Information for Effective Entity Salience Detection