OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Jintian Zhang,Cheng Peng,Mengshu Sun,Xiang Chen,Lei Liang,Zhiqiang Zhang,Jun Zhou,Huajun Chen,Ningyu Zhang
2024-10-02
Abstract:Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face limitations in directly handling retrieval tasks. However, many practical applications demand the seamless integration of both retrieval and generation. This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs' performance on tasks that require both generation and retrieval. The proposed framework bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively. This enables a single LLM to handle both tasks simultaneously in a unified forward pass. We conduct experiments on two distinct types of composite tasks, RAG and Entity Linking, to validate the pluggability, effectiveness, and efficiency of OneGen in training and inference. Furthermore, our results show that integrating generation and retrieval within the same context preserves the generative capabilities of LLMs while improving retrieval performance. To the best of our knowledge, OneGen is the first to enable LLMs to conduct vector retrieval during the generation.
Computation and Language,Artificial Intelligence,Databases,Information Retrieval,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to efficiently handle generation tasks and retrieval tasks simultaneously in large - language models (LLMs). Although current LLMs perform well in a variety of natural - language processing (NLP) tasks, they still have limitations when directly handling retrieval tasks. Many practical applications require the seamless integration of generation and retrieval functions. However, the traditional approach usually requires training the generation model and the retrieval model separately, which leads to several problems: 1. **Hardware Overhead**: Deploying and maintaining two independent models incurs additional hardware overhead and maintenance costs. 2. **Separation of Representation Spaces**: The representation spaces between the generation model and the retrieval model are separate, limiting the interaction between the two. 3. **Inference Computation Cost**: Each query requires an additional forward pass through the retriever, increasing the inference computation cost. 4. **Problems in Multi - round Dialogues**: In multi - round dialogues, query rewriting for subsequent questions increases the inference overhead and may lead to error propagation. 5. **Difficulty in End - to - End Optimization**: The traditional pipeline method is difficult to optimize end - to - end, and end - to - end optimization has been proven to significantly improve performance. To solve these problems, the paper proposes a new framework - OneGen, which can handle generation and retrieval tasks simultaneously in a single forward pass. Specifically, OneGen unifies the retrieval task and the generation task by introducing retrieval tokens for autoregressive generation, enabling an LLM to handle both tasks simultaneously. This framework not only improves the efficiency of tasks but also maintains the generation ability and enhances the retrieval performance at the same time. ### Main Contributions: 1. **Proposing an Efficient OneGen Framework**, which is especially suitable for tasks where generation and retrieval are intertwined. 2. **Demonstrating Superior Performance on Multiple Datasets**, especially on six RAG datasets and six entity - linking datasets. 3. **Showing the Advantages of OneGen in Terms of Inference Speed and Memory Consumption**, especially when the query length increases or the retrieval frequency increases. 4. **From a methodological perspective**, OneGen is an extension of Generative Instruction Tuning (GIT) and Representative Instruction Tuning (RIT). 5. **Making the Datasets and Code Open - Source**, contributing to the community. Through these contributions, OneGen provides an efficient and unified solution for integrating generation and retrieval tasks.