3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

Xindian Ma,Wenyuan Liu,Peng Zhang,Nan Xu
2024-06-14
Abstract:Inspired by the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D Rotary Position Encoding (RoPE), with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay, 3D-RPE allows for the regulation of long-term decay within the chunk size, ensuring the modeling of relative positional information between tokens at a distant relative position. For enhanced position resolution, 3D-RPE can mitigate the degradation of position resolution caused by position interpolation on RoPE. We have conducted experiments on long-context Natural Language Understanding (NLU) and long-sequence Language Modeling (LM) tasks. From the experimental results, 3D-RPE achieved performance improvements over RoPE, especially in long-context NLU tasks.
Computation and Language
What problem does this paper attempt to address?
The paper proposes a new method of position encoding called 3D Rotary Position Encoding (3D-RPE) to address the challenges in long-context modeling, especially the limitations of 2D Rotary Position Encoding (RoPE) widely used in Transformer architecture. RoPE has issues of long-term decay and decreased position resolution when dealing with long-distance token relationships. Inspired by the Bloch Sphere representation, 3D-RPE enhances the modeling of long-distance relative positional information by encoding positions through rotations on a three-dimensional sphere. Unlike RoPE, 3D-RPE allows for adjusting long-term decay within blocks to ensure modeling of the information between tokens with long-distance relative positions. Additionally, it improves position resolution through block-wise operations and setting rotation angles, thus alleviating the degradation of position resolution caused by position interpolation in RoPE. Experiments show that 3D-RPE outperforms RoPE in tasks involving long-context natural language understanding and long-sequence language modeling, particularly in language tasks that require understanding long-context. The main contributions of the paper include the proposal of 3D-RPE, demonstrating its controllable long-term decay and improved position resolution, and showcasing the improved performance of 3D-RPE in practical applications. Future research may further explore the potential of 3D-RPE in long-context language models.