A Survey of Generative Information Retrieval

Tzu-Lin Kuo,Tzu-Wei Chiu,Tzung-Sheng Lin,Sheng-Yang Wu,Chao-Wei Huang,Yun-Nung Chen
2024-06-04
Abstract:Generative Retrieval (GR) is an emerging paradigm in information retrieval that leverages generative models to directly map queries to relevant document identifiers (DocIDs) without the need for traditional query processing or document reranking. This survey provides a comprehensive overview of GR, highlighting key developments, indexing and retrieval strategies, and challenges. We discuss various document identifier strategies, including numerical and string-based identifiers, and explore different document representation methods. Our primary contribution lies in outlining future research directions that could profoundly impact the field: improving the quality of query generation, exploring learnable document identifiers, enhancing scalability, and integrating GR with multi-task learning frameworks. By examining state-of-the-art GR techniques and their applications, this survey aims to provide a foundational understanding of GR and inspire further innovations in this transformative approach to information retrieval. We also make the complementary materials such as paper collection publicly available at <a class="link-external link-https" href="https://github.com/MiuLab/GenIR-Survey/" rel="external noopener nofollow">this https URL</a>
Information Retrieval,Computation and Language
What problem does this paper attempt to address?
The paper aims to address the issue of Generative Retrieval (GR) in the field of Information Retrieval (IR) and provides a comprehensive review of this emerging paradigm. Specifically, the paper focuses on: 1. **Direct Document Mapping**: Utilizing generative models to directly map queries to relevant document identifiers (DocIDs) without the need for traditional query processing or document re-ranking. 2. **Key Developments and Strategies**: The paper discusses various document identifier strategies, including numerical and string identifiers, and explores different document representation methods. 3. **Future Research Directions**: Proposing future research directions that could profoundly impact the field, such as improving query generation quality, exploring learnable document identifiers, enhancing scalability, and integrating GR with multi-task learning frameworks. 4. **State-of-the-Art Techniques and Applications**: By reviewing the most advanced GR techniques and their applications, the paper aims to provide readers with a foundational understanding of GR and inspire further innovation in this transformative information retrieval approach. Through these efforts, the paper aims to advance the field of generative retrieval and provide researchers with a comprehensive understanding framework.