Abstract:Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face limitations in directly handling retrieval tasks. However, many practical applications demand the seamless integration of both retrieval and generation. This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs' performance on tasks that require both generation and retrieval. The proposed framework bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively. This enables a single LLM to handle both tasks simultaneously in a unified forward pass. We conduct experiments on two distinct types of composite tasks, RAG and Entity Linking, to validate the pluggability, effectiveness, and efficiency of OneGen in training and inference. Furthermore, our results show that integrating generation and retrieval within the same context preserves the generative capabilities of LLMs while improving retrieval performance. To the best of our knowledge, OneGen is the first to enable LLMs to conduct vector retrieval during the generation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to efficiently handle generation tasks and retrieval tasks simultaneously in large - language models (LLMs). Although current LLMs perform well in a variety of natural - language processing (NLP) tasks, they still have limitations when directly handling retrieval tasks. Many practical applications require the seamless integration of generation and retrieval functions. However, the traditional approach usually requires training the generation model and the retrieval model separately, which leads to several problems: 1. **Hardware Overhead**: Deploying and maintaining two independent models incurs additional hardware overhead and maintenance costs. 2. **Separation of Representation Spaces**: The representation spaces between the generation model and the retrieval model are separate, limiting the interaction between the two. 3. **Inference Computation Cost**: Each query requires an additional forward pass through the retriever, increasing the inference computation cost. 4. **Problems in Multi - round Dialogues**: In multi - round dialogues, query rewriting for subsequent questions increases the inference overhead and may lead to error propagation. 5. **Difficulty in End - to - End Optimization**: The traditional pipeline method is difficult to optimize end - to - end, and end - to - end optimization has been proven to significantly improve performance. To solve these problems, the paper proposes a new framework - OneGen, which can handle generation and retrieval tasks simultaneously in a single forward pass. Specifically, OneGen unifies the retrieval task and the generation task by introducing retrieval tokens for autoregressive generation, enabling an LLM to handle both tasks simultaneously. This framework not only improves the efficiency of tasks but also maintains the generation ability and enhances the retrieval performance at the same time. ### Main Contributions: 1. **Proposing an Efficient OneGen Framework**, which is especially suitable for tasks where generation and retrieval are intertwined. 2. **Demonstrating Superior Performance on Multiple Datasets**, especially on six RAG datasets and six entity - linking datasets. 3. **Showing the Advantages of OneGen in Terms of Inference Speed and Memory Consumption**, especially when the query length increases or the retrieval frequency increases. 4. **From a methodological perspective**, OneGen is an extension of Generative Instruction Tuning (GIT) and Representative Instruction Tuning (RIT). 5. **Making the Datasets and Code Open - Source**, contributing to the community. Through these contributions, OneGen provides an efficient and unified solution for integrating generation and retrieval tasks.

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential

UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models

ToolGen: Unified Tool Retrieval and Calling via Generation

UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

Retrieval-Augmented Generation for Large Language Models: A Survey

IterGen: Iterative Structured LLM Generation

IDGenRec: LLM-RecSys Alignment with Textual ID Learning

GeneRAG: Enhancing Large Language Models with Gene-Related Task by Retrieval-Augmented Generation

Bridging the Preference Gap between Retrievers and LLMs

CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive Tasks

LLMCad: Fast and Scalable On-device Large Language Model Inference

Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation

LLMs Meet Multimodal Generation and Editing: A Survey

GenRec: Large Language Model for Generative Recommendation

ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval

LLMGA: Multimodal Large Language Model based Generation Assistant

MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training