Complex-Valued Relative Positional Encodings for Transformer

Gang Yang,Hongzhe Xu
DOI: https://doi.org/10.1109/NNICE58320.2023.10105716
2023-01-01
Abstract:Recently, the self-attention mechanism (Transformer) has shown its advantages in various natural language processing (NLP) tasks. Since positional information is crucial to NLP tasks, the positional encoding has become a critical factor in improving the performance of the Transformer. In this paper, we present a simple but effective complex-valued relative positional encoding (CRPE) method. Specifically, we map the query and key vectors to the complex domain based on their positions. Hence, the attention weights will directly contain the relative positional information by the dot product between the complex-valued query and key vectors. To demonstrate the effectiveness of our method, we use four typical NLP tasks: named entity recognition, text classification, machine translation, and language modeling. The datasets of these tasks comprise texts of varying lengths. In the experiments, our method outperforms the baseline positional encodings across all datasets. The results show that our method is more effective for long and short texts while containing fewer parameters.
What problem does this paper attempt to address?