3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

Xindian Ma,Wenyuan Liu,Peng Zhang,Nan Xu

2024-06-14

Abstract:Inspired by the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D Rotary Position Encoding (RoPE), with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay, 3D-RPE allows for the regulation of long-term decay within the chunk size, ensuring the modeling of relative positional information between tokens at a distant relative position. For enhanced position resolution, 3D-RPE can mitigate the degradation of position resolution caused by position interpolation on RoPE. We have conducted experiments on long-context Natural Language Understanding (NLU) and long-sequence Language Modeling (LM) tasks. From the experimental results, 3D-RPE achieved performance improvements over RoPE, especially in long-context NLU tasks.

Computation and Language

What problem does this paper attempt to address?

The paper proposes a new method of position encoding called 3D Rotary Position Encoding (3D-RPE) to address the challenges in long-context modeling, especially the limitations of 2D Rotary Position Encoding (RoPE) widely used in Transformer architecture. RoPE has issues of long-term decay and decreased position resolution when dealing with long-distance token relationships. Inspired by the Bloch Sphere representation, 3D-RPE enhances the modeling of long-distance relative positional information by encoding positions through rotations on a three-dimensional sphere. Unlike RoPE, 3D-RPE allows for adjusting long-term decay within blocks to ensure modeling of the information between tokens with long-distance relative positions. Additionally, it improves position resolution through block-wise operations and setting rotation angles, thus alleviating the degradation of position resolution caused by position interpolation in RoPE. Experiments show that 3D-RPE outperforms RoPE in tasks involving long-context natural language understanding and long-sequence language modeling, particularly in language tasks that require understanding long-context. The main contributions of the paper include the proposal of 3D-RPE, demonstrating its controllable long-term decay and improved position resolution, and showcasing the improved performance of 3D-RPE in practical applications. Future research may further explore the potential of 3D-RPE in long-context language models.

3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

Round and Round We Go! What makes Rotary Positional Encodings useful?

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective

HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

On the token distance modeling ability of higher RoPE attention dimension

Resonance RoPE: Improving Context Length Generalization of Large Language Models

Base of RoPE Bounds Context Length

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

LongEmbed: Extending Embedding Models for Long Context Retrieval

Extending Context Window of Large Language Models from a Distributional Perspective

RoPE-BAM: Nested Entity Recognition Based on Rotary Position Embedding and Biaffine Attention Mechanism

Scaling Laws of RoPE-based Extrapolation.

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Explore Better Relative Position Embeddings from Encoding Perspective for Transformer Models.

PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models

mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

LieRE: Generalizing Rotary Position Encodings

RoFormer: Enhanced Transformer with Rotary Position Embedding