Abstract:Lexical and semantic matches are commonly used as relevance measurements for information retrieval. Together they estimate the semantic equivalence between the query and the candidates. However, semantic equivalence is not the only relevance signal that needs to be considered when retrieving evidences for multi-hop questions. In this work, we demonstrate that textual entailment relation is another important relevance dimension that should be considered. To retrieve evidences that are either semantically equivalent to or entailed by the question simultaneously, we divide the task of evidence retrieval for multi-hop question answering (QA) into two sub-tasks, i.e., semantic textual similarity and inference similarity retrieval. We propose two ensemble models, EAR and EARnest, which tackle each of the sub-tasks separately and then jointly re-rank sentences with the consideration of the diverse relevance signals. Experimental results on HotpotQA verify that our models not only significantly outperform all the single retrieval models it is based on, but is also more effective than two intuitive ensemble baseline models.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in multi - hop question answering (Multi - hop QA), evidence retrieval needs to consider not only semantic similarity (Semantic Equivalence) but also textual entailment (Textual Entailment) simultaneously. Most existing evidence retrieval methods mainly rely on term matching or semantic similarity to determine relevance, ignoring inference signals (Inference Signals). Especially for complex questions requiring multi - step reasoning, relying solely on term or semantic similarity cannot fully capture all relevant evidence. Specifically, the paper points out that in multi - hop questions, the association between relevant evidence and the question is not only lexical or semantic similarity, but may also involve inferential relationships, such as deriving one conclusion from another premise. Therefore, the paper proposes a new method, decomposing the evidence retrieval task into two subtasks: semantic textual similarity retrieval (Semantic Textual Similarity Retrieval) and inference similarity retrieval (Inference Similarity Retrieval), and combining the results of these two subtasks through ensemble models (Ensemble Models) to improve the accuracy of evidence retrieval for multi - hop questions. The main contributions of the paper are as follows: 1. It emphasizes that in complex question - answering systems, in addition to traditional semantic similarity, textual entailment is also an important dimension for measuring evidence relevance. 2. It proposes two ensemble models (EAR and EARnest), which can combine different relevance signals to retrieve evidence for multi - hop questions more effectively. 3. The experimental results show that the proposed models are significantly better not only than individual retrieval models but also than several intuitive ensemble baseline models. Through this method, the paper aims to provide a more comprehensive and accurate evidence retrieval strategy, especially when dealing with complex questions requiring multi - step reasoning.

Divide & Conquer for Entailment-aware Multi-hop Evidence Retrieval