Context Modeling with Hierarchical Shallow Attention Structure for Document-Level NMT
Jianming Guo,Xinran Chen,Zihan Liu,Weijie Yuan,Jianshen Zhang,Gongshen Liu
DOI: https://doi.org/10.1109/ijcnn55064.2022.9891996
2022-01-01
Abstract:It is acknowledged that neural machine translation(NMT) can be improved by considering context information. Nevertheless, the progress of context-aware NMT has encountered some challenges. Firstly, effectively utilizing valuable information contained in context is still challenging. Moreover, as the number of sentences increases, the parameters of context-aware NMT models will surge, which costs computing power and prevents them from transferring to other translation tasks. Therefore, we propose a hierarchical shallow attention structure for document-level NMT to tackle the problems above. We employ hierarchical encoders to extract both sentence and context information. Then the integration of hierarchical attention is incorporated with the self-attention of the target sentences in the decoding phase. Moreover, we employ shallow attention to reduce model complexity. Assessments of several linguistic phenomena demonstrate that the proposed approach can balance model complexity and translation performance while getting SOTA BLEU scores in several translation tasks.