Abstract:Large language models (LLMs) are in need of sufficient contexts to handle many critical applications, such as retrieval augmented generation and few-shot learning. However, due to the constrained window size, the LLMs can only access to the information within a limited context. Although the size of context window can be extended by fine-tuning, it will result in a substantial cost in both training and inference stage. In this paper, we present Extensible Tokenization as an alternative method which realizes the flexible scaling of LLMs' context. Extensible Tokenization stands as a midware in between of the tokenized context and the LLM, which transforms the raw token embeddings into the extensible embeddings. Such embeddings provide a more compact representation for the long context, on top of which the LLM is able to perceive more information with the same context window. Extensible Tokenization is also featured by its flexibility: the scaling factor can be flexibly determined within a feasible scope, leading to the extension of an arbitrary context length at the inference time. Besides, Extensible Tokenization is introduced as a drop-in component, which can be seamlessly plugged into not only the LLM itself and but also its fine-tuned derivatives, bringing in the extended contextual information while fully preserving the LLM's existing capabilities. We perform comprehensive experiments on long-context language modeling and understanding tasks, which verify Extensible Tokenization as an effective, efficient, flexible, and compatible method to extend LLM's context. Our model and source code will be made publicly available.

Tuning Large Language Model for Speech Recognition With Mixed-Scale Re-Tokenization

A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

Prompting Large Language Models with Speech Recognition Abilities

Efficient Streaming LLM for Speech Recognition

Using Large Language Model for End-to-End Chinese ASR and NER

Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions

A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models

A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR

Large-scale Language Model Rescoring on Long-form Data

Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization

Tuning Large language model for End-to-end Speech Translation

Speech Recognition Rescoring with Large Speech-Text Foundation Models

Advancing Multi-talker ASR Performance with Large Language Models

End-to-End Speech Recognition Contextualization with Large Language Models

ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

Connecting Speech Encoder and Large Language Model for ASR

Speech-based Slot Filling using Large Language Models

Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study