Abstract:In-context learning (ICL) facilitates large language models (LLMs) exhibiting spectacular emergent capabilities in various scenarios. Unfortunately, introducing demonstrations easily makes the prompt length explode, bringing a significant burden to hardware. In addition, random demonstrations usually achieve limited improvements in ICL, necessitating demonstration selection among accessible candidates. Previous studies introduce extra modules to perform demonstration compression or selection independently. In this paper, we propose an ICL framework UniICL, which Unifies demonstration selection and compression, and final response generation via a single frozen LLM. Specifically, UniICL first projects actual demonstrations and inference text inputs into short virtual tokens, respectively. Then, virtual tokens are applied to select suitable demonstrations by measuring semantic similarity within latent space among candidate demonstrations and inference input. Finally, inference text inputs together with selected virtual demonstrations are fed into the same frozen LLM for response generation. Notably, UniICL is a parameter-efficient framework that only contains 17M trainable parameters originating from the projection layer. We conduct experiments and analysis over in- and out-domain datasets of both generative and understanding tasks, encompassing ICL scenarios with plentiful and limited demonstration candidates. Results show that UniICL effectively unifies $12 \times$ compression, demonstration selection, and response generation, efficiently scaling up the baseline from 4-shot to 64-shot ICL in IMDb with 24 GB CUDA allocation

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the issues of excessive input length caused by introducing examples in In-Context Learning (ICL) and the limited improvements typically achieved by randomly selecting examples. Specifically: 1. **Input Length Explosion**: In ICL, introducing demonstrations significantly increases input length, imposing a heavy burden on hardware and reducing inference throughput. 2. **Quality of Example Selection**: Randomly selected examples usually only bring limited performance improvements, necessitating effective selection from available candidate examples. To tackle these issues, existing research often introduces additional modules for independent example compression or selection. However, these methods increase memory overhead, and independent compressors or rankers need to be loaded alongside the target large language model (LLM). ### Solution This paper proposes a new ICL framework—UniICL, which unifies example selection, compression, and final response generation through a single frozen LLM. The specific contributions are as follows: 1. **Unified Framework**: UniICL is the first to propose an ICL framework that unifies example compression, selection, and generation through a single frozen LLM. 2. **Memory-Friendly**: UniICL is a parameter-efficient framework, containing only 17M trainable parameters, enabling large-scale ICL on consumer-grade GPUs. 3. **Demonstration Bank Configuration**: UniICL proposes configuring a Demonstration Bank (DB) to avoid redundant compression of the same examples, improving ICL efficiency. ### Method Overview 1. **Example Compression**: UniICL leverages the semantic understanding capabilities of the target LLM to independently compress different examples into compressed features, then uses a learnable projection layer to convert these features into compressed virtual tokens acceptable by the LLM. 2. **Example Selection**: The compressed virtual tokens are used not only to replace the original examples to reduce input length but also to select potential examples. Finally, the current query and the selected compressed virtual tokens are input into the same frozen LLM to generate responses. 3. **Response Generation**: UniICL generates responses through the frozen LLM, combining compressed virtual tokens and actual inference input for autoregressive generation. ### Experimental Results Experimental results show that UniICL effectively unifies 12x compression, example selection, and response generation, expanding the baseline from 4-shot to 64-shot ICL under 24GB CUDA allocation. Additionally, UniICL performs excellently in multiple benchmarks, including language acceptability, semantic classification, text summarization, and paragraph reordering tasks. ### Conclusion The UniICL framework proposed in this paper unifies example selection, compression, and generation through a single frozen LLM, effectively addressing the issues of input length explosion and example selection quality in ICL. Experimental results validate the effectiveness and efficiency of this framework.

Unifying Demonstration Selection and Compression for In-Context Learning

In-Context Compositional Generalization for Large Vision-Language Models

Are Human-generated Demonstrations Necessary for In-context Learning?

In-Context Learning Demonstration Selection via Influence Analysis

Demonstration Augmentation for Zero-shot In-context Learning

Scaling In-Context Demonstrations with Structured Attention

Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

Revisiting Demonstration Selection Strategies in In-Context Learning

Enhancing In-Context Learning via Implicit Demonstration Augmentation

AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning

ParaICL: Towards Robust Parallel In-Context Learning

Implicit In-context Learning

Dynamic Demonstrations Controller for In-Context Learning

Misconfidence-based Demonstration Selection for LLM In-Context Learning

Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process

Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning

Mixtures of In-Context Learners

In-Context Learning with Iterative Demonstration Selection

Comparable Demonstrations Are Important in In-Context Learning: A Novel Perspective on Demonstration Selection

FEDS-ICL: Enhancing Translation Ability and Efficiency of Large Language Model by Optimizing Demonstration Selection

Not All Demonstration Examples Are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning