One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models

Yutao Zhu,Zhaoheng Huang,Zhicheng Dou,Ji-Rong Wen

2024-06-08

Abstract:Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) for generating more factual, accurate, and up-to-date content. Existing methods either optimize prompts to guide LLMs in leveraging retrieved information or directly fine-tune LLMs to adapt to RAG scenarios. Although fine-tuning can yield better performance, it often compromises the LLMs' general generation capabilities by modifying their parameters. This limitation poses challenges in practical applications, especially when LLMs are already deployed, as parameter adjustments may affect their original functionality. To address this, we propose a novel method that involves learning scalable and pluggable virtual tokens for RAG. By maintaining the LLMs' original parameters and fine-tuning only the embeddings of these pluggable tokens, our approach not only enhances LLMs' performance but also preserves their general generation capabilities. Furthermore, we design several training strategies to improve the scalability, flexibility, and generalizability of our method. Comprehensive experiments across nine question-answering tasks demonstrate the superiority of our approach.

Computation and Language

What problem does this paper attempt to address?

The paper aims to address the issues of hallucination, outdated, or inaccurate content that large language models (LLMs) may generate, especially in scenarios requiring long-tail knowledge. To tackle this challenge, the paper proposes a new method called SPRING, which enhances the performance of LLMs in retrieval-augmented generation (RAG) scenarios by introducing trainable virtual tokens, while maintaining their general generation capabilities. Specifically, the SPRING method has the following features: 1. **Lightweight and Efficient**: By only adjusting the added virtual token embeddings without updating the entire LLM parameters, SPRING enhances performance while remaining lightweight. 2. **Scalability**: SPRING's training method allows for adjusting the number of virtual tokens according to the needs of the inference scenario, significantly improving performance even with just 1 token. 3. **Plug-and-Play**: Due to its lightweight design, SPRING can simply add virtual tokens to enhance performance when retrieval is triggered, and omit these tokens in non-RAG scenarios, thereby preserving the original generation capabilities of LLMs. 4. **Strong Generalization**: SPRING's robust training strategy enables it to adapt to different retrievers and varying numbers of retrieval results, without the need for retraining every time the retrieval system is updated. Experimental results show that SPRING not only effectively improves the performance of LLMs in RAG tasks but also successfully retains their general generation capabilities in non-RAG tasks. Additionally, SPRING outperforms other methods across various tasks and demonstrates good adaptability and robustness to different retrievers.

One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models

Retrieval-Augmented Generation for Large Language Models: A Survey

A Theory for Token-Level Harmonization in Retrieval-Augmented Generation

MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Bridging the Preference Gap between Retrievers and LLMs

Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models

AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant

Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In

RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation

Retrieve Anything To Augment Large Language Models

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues

A Survey on Retrieval-Augmented Text Generation for Large Language Models

Large Language Models (LLMs): Deployment, Tokenomics and Sustainability

Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation

Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models

RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

Benchmarking Large Language Models in Retrieval-Augmented Generation