Abstract:Conventional document retrieval techniques are mainly based on the index-retrieve paradigm. It is challenging to optimize pipelines based on this paradigm in an end-to-end manner. As an alternative, generative retrieval represents documents as identifiers (docid) and retrieves documents by generating docids, enabling end-to-end modeling of document retrieval tasks. However, it is an open question how one should define the document identifiers. Current approaches to the task of defining document identifiers rely on fixed rule-based docids, such as the title of a document or the result of clustering BERT embeddings, which often fail to capture the complete semantic information of a document. We propose GenRet, a document tokenization learning method to address the challenge of defining document identifiers for generative retrieval. GenRet learns to tokenize documents into short discrete representations (i.e., docids) via a discrete auto-encoding approach. Three components are included in GenRet: (i) a tokenization model that produces docids for documents; (ii) a reconstruction model that learns to reconstruct a document based on a docid; and (iii) a sequence-to-sequence retrieval model that generates relevant document identifiers directly for a designated query. By using an auto-encoding framework, GenRet learns semantic docids in a fully end-to-end manner. We also develop a progressive training scheme to capture the autoregressive nature of docids and to stabilize training. We conduct experiments on the NQ320K, MS MARCO, and BEIR datasets to assess the effectiveness of GenRet. GenRet establishes the new state-of-the-art on the NQ320K dataset. Especially, compared to generative retrieval baselines, GenRet can achieve significant improvements on the unseen documents. GenRet also outperforms comparable baselines on MS MARCO and BEIR, demonstrating the method's generalizability.

Enhancing Generative Retrieval with Reinforcement Learning from Relevance Feedback

ROGER: Ranking-oriented Generative Retrieval

Generative Retrieval Meets Multi-Graded Relevance

Re3val: Reinforced and Reranked Generative Retrieval

Listwise Generative Retrieval Models via a Sequential Learning Process

Learning to Rank in Generative Retrieval

Distillation Enhanced Generative Retrieval

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA System.

Generative Relevance Feedback and Convergence of Adaptive Re-Ranking: University of Glasgow Terrier Team at TREC DL 2023

Tuning Query Reformulator with Fine-Grained Relevance Feedback

Learning to Tokenize for Generative Retrieval

GRM: Generative Relevance Modeling Using Relevance-Aware Sample Estimation for Document Retrieval

Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback

Relevance is a Guiding Light: Relevance-aware Adaptive Learning for End-to-end Task-oriented Dialogue System

DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering

Generative Relevance Feedback with Large Language Models

Unveiling the Magic: Investigating Attention Distillation in Retrieval-augmented Generation

EnsembleGAN: Adversarial Learning for Retrieval-Generation Ensemble Model on Short-Text Conversation

From Matching to Generation: A Survey on Generative Information Retrieval