Bidirectional Transformer with Absolute-Position Aware Relative Position Encoding for Encoding Sentences

Qi Le,Zhang Yu,Liu Ting
DOI: https://doi.org/10.1007/s11704-022-0610-2
IF: 2.6688
2022-01-01
Frontiers of Computer Science
Abstract:Transformers have been widely studied in many natural language processing(NLP)tasks,which can capture the dependency from the whole sentence with a high paralleli-zability thanks to the multi-head attention and the position-wise feed-forward network.However,the above two components of transformers are position-independent,which causes transfor-mers to be weak in modeling sentence structures.Existing studies commonly utilized positional encoding or mask stra-tegies for capturing the structural information of sentences.In this paper,we aim at strengthening the ability of transformers on modeling the linear structure of sentences from three aspects,containing the absolute position of tokens,the relative distance,and the direction between tokens.We propose a novel bidirectional Transformer with absolute-position aware relative position encoding(BiAR-Transformer)that combines the positional encoding and the mask strategy together.We model the relative distance between tokens along with the absolute position of tokens by a novel absolute-position aware relative position encoding.Meanwhile,we apply a bidirectional mask strategy for modeling the direction between tokens.Experi-mental results on the natural language inference,paraphrase identification,sentiment classification and machine translation tasks show that BiAR-Transformer achieves superior perfor-mance than other strong baselines.
What problem does this paper attempt to address?