Entity Retrieval for Answering Entity-Centric Questions

Hassan S. Shavarani,Anoop Sarkar
2024-08-06
Abstract:The similarity between the question and indexed documents is a crucial factor in document retrieval for retrieval-augmented question answering. Although this is typically the only method for obtaining the relevant documents, it is not the sole approach when dealing with entity-centric questions. In this study, we propose Entity Retrieval, a novel retrieval method which rather than relying on question-document similarity, depends on the salient entities within the question to identify the retrieval documents. We conduct an in-depth analysis of the performance of both dense and sparse retrieval methods in comparison to Entity Retrieval. Our findings reveal that our method not only leads to more accurate answers to entity-centric questions but also operates more efficiently.
Information Retrieval,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to obtain relevant documents more accurately and efficiently in entity - centric questions to enhance the answers generated by large - language models (LLMs). Specifically, the paper proposes a new retrieval method - **Entity Retrieval**, which identifies the salient entities in the question and looks for documents corresponding to these entities from the knowledge base (such as Wikipedia), instead of relying on the traditional retrieval method based on question - document similarity. ### Main problems 1. **Improving the accuracy of entity - centric questions**: Traditional retrieval methods (such as sparse retrieval BM25, dense retrieval DPR, and ANCE) may retrieve irrelevant documents when dealing with entity - centric questions, thus affecting the accuracy of the answers. The method proposed in the paper aims to improve the accuracy and efficiency of retrieval by using salient entities. 2. **Reducing the impact of irrelevant documents**: The paper points out that the retrieved irrelevant documents will reduce the performance of the system. Therefore, the entity retrieval method improves the overall performance by reducing the number of retrieved documents while ensuring the relevance of these documents. 3. **Improving retrieval efficiency**: Traditional retrieval methods need to store a large number of indexes and load these indexes during inference, which is a challenge in resource - limited environments (such as mobile devices). The entity retrieval method simplifies this process and improves efficiency by directly looking for relevant documents from the knowledge base. ### Method overview - **Entity identification**: First, identify the salient entities in the question. - **Knowledge base lookup**: Look for the corresponding documents from the knowledge base (such as Wikipedia) according to the identified entities. - **Document truncation**: Truncate the retrieved documents to the first W words to form a document set for enhancing the question. - **Enhanced question answering**: Provide the truncated documents as context to the LLM to generate more accurate answers. ### Experimental results - **Performance evaluation**: The paper evaluates the performance of the entity retrieval method through multiple metrics (such as nDCG@k, MRR, and Top - k retrieval accuracy) and compares it with other traditional retrieval methods. - **Experimental data set**: Three data sets, EntityQuestions, FactoidQA, and StrategyQA, are used for the experiment. - **Experimental results**: The experimental results show that the entity retrieval method is superior to traditional retrieval methods in most cases, especially when dealing with entity - centric questions. ### Conclusion The entity retrieval method proposed in the paper not only improves the accuracy of answering entity - centric questions but also significantly improves the retrieval efficiency. By reducing the interference of irrelevant documents, the entity retrieval method can more effectively support LLMs to generate high - quality answers.