Transformer with Layer Fusion and Interaction

Yan Mingming,You Shuai,Sun Jianing,Gao Xiang,Lu Biao,Huang Wenxian
DOI: https://doi.org/10.1109/iccwamtip56608.2022.10016501
2022-01-01
Abstract:Neural machine translation models, such as Recurrent Neural Networks, Long Short-Term Memory networks and the Transformer, are widely used in many translation tasks. Neural machine translation models often use the deep encoder and decoder as multiple layers to capture more semantic information. However, these models leverage the encoder top layer and decoder only when they process a sequence, which omits other layers' useful information. This paper, we propose an approach that includes two strategies. First, we propose a strategy called layer fusion to make full use of all layers. Then, we make the hierarchical interaction between the encoder and the decoder by using the output of each encoder layer as the part input of the corresponding decoder layer. We take the Transformer as an example in this work since it has achieved the best performance. The experimental results on the English and German translation data of WMT14 demonstrate the effectiveness of the proposed approach, and it is also applicable to other models.
What problem does this paper attempt to address?