Unlocking the Power of GANs in Non-Autoregressive Text Generation

Da Ren,Yi Cai,Qing Li
2024-10-02
Abstract:Generative Adversarial Networks (GANs) have been studied in text generation to tackle the exposure bias problem. Despite their remarkable development, they adopt autoregressive structures so suffering from high latency in both training and inference stages. Although GANs have potential to support efficient generation by adopting non-autoregressive (NAR) structures, their explorations in NAR models are extremely limited. In this work, we conduct pioneering study of building language GANs based on NAR structures. We identify two issues that constrain the performance of GAN-based NAR models. Firstly, existing methods of incorporating latent variables provide highly similar representations which cannot describe the diversity of different words in sentences. We tackle this problem by proposing Position-Aware Self-Modulation, providing more diverse and effective representations. Secondly, the attention mechanism in Transformer cannot accurately build word dependencies in the unstable training of GANs, and we adopt Dependency Feed Forward Network to enhance the model capacity in dependency modeling. Armed with these two facilities, we propose a GAN-based NAR model, Adversarial Non-autoregressive Transformer (ANT). The experimental results demonstrate that ANT can achieve comparable performance with mainstream models in a single forward pass and has great potential in various applications like latent interpolation and semi-supervised learning.
Computation and Language
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the high - latency issue of existing language Generative Adversarial Networks (GANs) based on the Autoregressive (AR) structure in text - generation tasks, and the performance limitations of Non - Autoregressive (NAR) models. Specifically: 1. **High - Latency Issue**: Existing language GANs adopt an autoregressive structure and rely on previously generated words during both the training and inference stages, which results in an inability to support parallel computing and has a relatively high latency. 2. **NAR Model Performance Issues**: - Existing methods of integrating latent variables into the model provide very similar representations and are unable to describe the diversity between different words in a sentence, thus leading to inaccurate generated sentences. - The attention mechanism in the Transformer is unable to accurately establish word - dependency relationships during the unstable training process of GANs, ultimately resulting in ungrammatical generated sentences. To solve these problems, the author proposes a language - generation model based on the NAR structure - the Adversarial Non - autoregressive Transformer (ANT). This model introduces two key improvements: - **Position - Aware Self - Modulation**: By providing diverse hidden representations for words in different positions, the model is able to generate more diverse sentences. - **Dependency Feed Forward Network (Dependency FFN)**: It enhances the model's ability to model word - dependency relationships, ensuring that word - dependency relationships can be maintained even during the unstable training process of GANs. Experimental results show that ANT can achieve performance comparable to mainstream models in a single forward pass and exhibits significantly lower latency in both unconditional and conditional generation tasks. In addition, ANT also shows great potential in applications such as semi - supervised learning and latent interpolation. ### Summary The main contributions of the paper are as follows: 1. Proposing the ANT, a language GANs model based on the NAR structure, which solves the high - latency problem of existing AR language GANs. 2. Introducing two innovative modules, Position - Aware Self - Modulation and Dependency FFN, which respectively solve the problems of single latent variable representation and unstable word - dependency relationships. 3. Experimentally verifying the effectiveness and efficiency of ANT in multiple tasks, demonstrating its potential in low - latency generation and multi - task applications. These improvements not only enhance the performance of NAR models but also provide new directions for future research.