Non-iterative Parallel Text Generation via Glancing Transformer

Lihua Qian,Hao Zhou,Yu Bao,Mingxuan Wang,Lin Qiu,Weinan Zhang,Yong Yu,Lei Li
2021-01-01
Abstract:Although non-autoregressive models with one-iteration generation achieve remarkable inference speed-up, they still fall behind their autoregressive counterparts in prediction accuracy. The non-autoregressive models with the best accuracy currently rely on multiple decoding iterations, which largely sacrifice the inference speed of non-autoregressive models. Inspired by the way of learning word dependencies in autoregressive and iterative-decoding models, we propose Glancing Transformer (GLAT) with a glancing language model (GLM), which learns to capture the word dependency gradually. Experiments on three benchmarks demonstrate that our approach can significantly improve the accuracy of non-autoregressive models without multiple decoding iterations. In particular, GLAT achieves state-of-the-art results among non-iterative models and even outperforms top iterative counterparts in some specific benchmarks.
What problem does this paper attempt to address?