Genetic Algorithm-based Transformer Architecture Design for Neural Machine Translation.

Jie Wu,Ben Feng,Yanan Sun
DOI: https://doi.org/10.1145/3568199.3568215
2022-01-01
Abstract:A Great progress of Neural Machine Translation (NMT) tasks has been achieved by the transformer models in recent years, which is largely owing to the careful design of multi-head attention and feed-forward neural network layers in its encoder-decoder architecture. However, these layers are often manually designed by expertise, makes it time-consuming and hard to explore potentially promising ordering patterns. In this paper, an automatic transformer architecture design algorithm based on genetic algorithm is proposed to evolve the optimal transformer architecture for NMT tasks. Particularly, a novel gene encoding strategy is developed in the proposed algorithm to effectively enable transformer architectures to have various layer ordering patterns and hyper-parameters, and then the effective genetic operators are designed to perform the efficient evolutionary search for finding optimal architecture. To validate the effectiveness of the proposed algorithm, the experiments are conducted on a widely used machine translation benchmark, and the result shows that the model automatically searched by the proposed algorithm outperforms the vanilla transformer under different model sizes.
What problem does this paper attempt to address?