Understanding Differential Search Index for Text Retrieval

Xiaoyang Chen,Yanjiang Liu,Ben He,Le Sun,Yingfei Sun
2023-05-23
Abstract:The Differentiable Search Index (DSI) is a novel information retrieval (IR) framework that utilizes a differentiable function to generate a sorted list of document identifiers in response to a given query. However, due to the black-box nature of the end-to-end neural architecture, it remains to be understood to what extent DSI possesses the basic indexing and retrieval abilities. To mitigate this gap, in this study, we define and examine three important abilities that a functioning IR framework should possess, namely, exclusivity, completeness, and relevance ordering. Our analytical experimentation shows that while DSI demonstrates proficiency in memorizing the unidirectional mapping from pseudo queries to document identifiers, it falls short in distinguishing relevant documents from random ones, thereby negatively impacting its retrieval effectiveness. To address this issue, we propose a multi-task distillation approach to enhance the retrieval quality without altering the structure of the model and successfully endow it with improved indexing abilities. Through experiments conducted on various datasets, we demonstrate that our proposed method outperforms previous DSI baselines.
Information Retrieval
What problem does this paper attempt to address?
The paper primarily focuses on improving the performance of the Differentiable Search Index (DSI) framework in text retrieval tasks. DSI is a novel information retrieval framework that utilizes differentiable functions to generate a ranked list of document identifiers based on a given query. However, due to the black-box nature of end-to-end neural architectures, the performance of DSI in terms of fundamental indexing and retrieval capabilities remains unclear. To address this knowledge gap, the authors define and evaluate three capabilities critical to the functionality of information retrieval frameworks: exclusivity, completeness, and relevance ordering. Through experimental analysis of existing DSI models, the authors find that while DSI is adept at memorizing unidirectional mappings from pseudo-queries to document identifiers, it falls short in distinguishing relevant documents from random ones, negatively impacting its retrieval effectiveness. To overcome these issues, the authors propose a multi-task distillation approach to enhance retrieval quality without altering the model structure. This method improves the indexing capabilities of DSI by learning from dense retrieval models. Through experiments on various datasets, the authors demonstrate that the proposed method surpasses previous DSI baseline models on multiple metrics. In summary, this paper aims to: 1. Analyze the capability limitations of existing DSI models in terms of exclusivity, completeness, and relevance ordering. 2. Propose a multi-task distillation method to enhance the retrieval performance of DSI models, particularly in relevance ordering. 3. Prove that the proposed method can effectively improve retrieval performance across multiple datasets.