Searching Better Architectures for Neural Machine Translation
Yang Fan,Fei Tian,Yingce Xia,Tao Qin,Xiang-Yang Li,Tie-Yan Liu
DOI: https://doi.org/10.1109/taslp.2020.2995270
2020-01-01
IEEE/ACM Transactions on Audio Speech and Language Processing
Abstract:Neural architecture search (NAS) has played important roles in the evolution of neural architectures. However, no much attention has been paid to improve neural machine translation (NMT) through NAS approaches. In this work, we propose a gradient-based NAS algorithm for NMT, which automatically discovers architectures with better performances. Compared with previous NAS work, we jointly search the network operations (e.g., LSTM, CNN, self-attention etc) as well as dropout rates to ensure better results. We show that with reasonable resources it is possible to discover novel neural network architectures for NMT, which achieve consistently better performances than Transformer [1], the state-of-the-art NMT model, across different tasks. On WMT'14 English-to-German translation, IWSLT'14 German-to-English translation and WMT'18 Finnish-to-English translation tasks, our discovered architectures could obtain 30.1, 36.1 and 26.4 BLEU scores, which are great improvement over Transformer baselines. We also empirically verify that the discovered model on one task can be transferred to other tasks.