Evolving transformer architecture for neural machine translation

Ben Feng,Dayiheng Liu,Yanan Sun
DOI: https://doi.org/10.1145/3449726.3459441
2021-01-01
Abstract:ABSTRACTThe transformer models have achieved great success on neural machine translation tasks in recent years. However, the hyper-parameters of the transformer are often manually designed by expertise, where the layer is often regularly stacked together without exploring potentially promising ordering patterns. In this paper, we propose a transformer architecture design algorithm based on genetic algorithm, which can automatically find the proper layer ordering pattern and hyper-parameters for the tasks at hand. The experimental results show that the models designed by the proposed algorithm outperform the vanilla transformer on the widely used machine translation benchmark, which reveals that the performance of transformer architecture can be improved by adjusting layer ordering pattern and hyper-parameters by the proposed algorithm.
What problem does this paper attempt to address?