Abstract:Retrieval-augmented large language models (LLMs) leverage relevant content retrieved by information retrieval systems to generate correct responses, aiming to alleviate the hallucination problem. However, existing retriever-responder methods typically append relevant documents to the prompt of LLMs to perform text generation tasks without considering the interaction of fine-grained structural semantics between the retrieved documents and the LLMs. This issue is particularly important for accurate response generation as LLMs tend to ``lose in the middle'' when dealing with input prompts augmented with lengthy documents. In this work, we propose a new pipeline named ``Reinforced Retriever-Reorder-Responder'' (R$^4$) to learn document orderings for retrieval-augmented LLMs, thereby further enhancing their generation abilities while the large numbers of parameters of LLMs remain frozen. The reordering learning process is divided into two steps according to the quality of the generated responses: document order adjustment and document representation enhancement. Specifically, document order adjustment aims to organize retrieved document orderings into beginning, middle, and end positions based on graph attention learning, which maximizes the reinforced reward of response quality. Document representation enhancement further refines the representations of retrieved documents for responses of poor quality via document-level gradient adversarial learning. Extensive experiments demonstrate that our proposed pipeline achieves better factual question-answering performance on knowledge-intensive tasks compared to strong baselines across various public datasets. The source codes and trained models will be released upon paper acceptance.

Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval

Generative Retrieval with Large Language Models

Longtriever: a Pre-trained Long Text Encoder for Dense Document Retrieval

Exploring the Best Practices of Query Expansion with Large Language Models

In-context Pretraining: Language Modeling Beyond Document Boundaries

Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment

Query Rewriting for Retrieval-Augmented Large Language Models

Retrieval-Augmented Retrieval: Large Language Models Are Strong Zero-Shot Retriever.

Query Rewriting in Retrieval-Augmented Large Language Models.

Large Language Models are Built-in Autoregressive Search Engines

MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion

Query-as-context Pre-training for Dense Passage Retrieval

Large Language Models are Strong Zero-Shot Retriever

Corpus-Steered Query Expansion with Large Language Models

Query Expansion by Prompting Large Language Models

Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

Large Language Model-guided Document Selection

R4: Reinforced Retriever-Reorder-Responder for Retrieval-Augmented Large Language Models

Fine-Tuning LLaMA for Multi-Stage Text Retrieval

ScalingNote: Scaling up Retrievers with Large Language Models for Real-World Dense Retrieval

Retrieval meets Long Context Large Language Models