Phrase-level Self-Attention Networks for Universal Sentence Encoding

Wei Wu,Houfeng Wang,Tianyu Liu,Shuming Ma
DOI: https://doi.org/10.18653/v1/d18-1408
2018-01-01
Abstract:Universal sentence encoding is a hot topic in recent NLP research. Attention mechanism has been an integral part in many sentence encoding models, allowing the models to capture context dependencies regardless of the distance between elements in the sequence. Fully attention-based models have recently attracted enormous interest due to their highly parallelizable computation and significantly less training time. However, the memory consumption of their models grows quadratically with sentence length, and the syntactic information is neglected. To this end, we propose Phrase-level Self-Attention Networks (PSAN) that perform self-attention across words inside a phrase to capture context dependencies at the phrase level, and use the gated memory updating mechanism to refine each word's representation hierarchically with longer-term context dependencies captured in a larger phrase. As a result, the memory consumption can be reduced because the self-attention is performed at the phrase level instead of the sentence level. At the same time, syntactic information can be easily integrated in the model. Experiment results show that PSAN can achieve the state-of-the-art transfer performance across a plethora of NLP tasks including sentence classification, natural language inference and sentence textual similarity.
What problem does this paper attempt to address?