Text Feature Adversarial Learning for Text Generation With Knowledge Transfer From GPT2

Hao Zhang,Yulai Cong,Zhengjue Wang,Lei Zhang,Miaoyun Zhao,Liqun Chen,Shijing Si,Ricardo Henao,Lawrence Carin
DOI: https://doi.org/10.1109/tnnls.2022.3210975
IF: 14.255
2022-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Text generation is a key component of many natural language tasks. Motivated by the success of generative adversarial networks (GANs) for image generation, many text-specific GANs have been proposed. However, due to the discrete nature of text, these text GANs often use reinforcement learning (RL) or continuous relaxations to calculate gradients during learning, leading to high-variance or biased estimation. Furthermore, the existing text GANs often suffer from mode collapse (i.e., they have limited generative diversity). To tackle these problems, we propose a new text GAN model named text feature GAN (TFGAN), where adversarial learning is performed in a continuous text feature space. In the adversarial game, GPT2 provides the "true" features, while the generator of TFGAN learns from them. TFGAN is trained by maximum likelihood estimation on text space and adversarial learning on text feature space, effectively combining them into a single objective, while alleviating mode collapse. TFGAN achieves appealing performance in text generation tasks, and it can also be used as a flexible framework for learning text representations.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the following issues: 1. **Challenges in Text Generation**: - **Discreteness Problem**: Due to the discrete nature of text data, existing Text Generative Adversarial Networks (Text GANs) require the use of Reinforcement Learning (RL) or continuous relaxation methods to compute gradients during training, which can lead to high variance or biased estimates. - **Mode Collapse Problem**: Existing Text GANs often suffer from mode collapse, resulting in insufficient diversity in the generated text. 2. **Limitations of Existing Methods**: - **Pre-training Requirement**: Many Text GANs need to be pre-trained using Maximum Likelihood Estimation (MLE) to improve stability and reliability. - **Insufficient Performance**: Although some studies attempt to train Text GANs from scratch, their performance is far below that of state-of-the-art models like GPT-2. 3. **Need for Knowledge Transfer**: - There is a need for a method to transfer rich knowledge from large pre-trained models (such as GPT-2) to smaller models to reduce model size and accelerate the inference process. ### Solution The paper proposes a new Text GAN model called Text Feature GAN (TFGAN), which has the following features: 1. **Adversarial Learning in Continuous Feature Space**: - TFGAN performs adversarial learning in the continuous text feature space instead of the discrete text space, thus avoiding the gradient computation challenges posed by discrete random variables. 2. **Combining the Advantages of MLE and Adversarial Learning**: - TFGAN employs Maximum Likelihood Estimation (MLE) in the discrete text space and Adversarial Learning (AL) in the continuous text feature space, achieving a complementary advantage of both methods to generate high-quality and diverse text. 3. **Knowledge Transfer**: - TFGAN achieves knowledge transfer to smaller models by extracting "real" text features from GPT-2, reducing model size and improving inference speed. TFGAN is 3.25 times smaller than GPT-2 and 11.5 times faster. 4. **Flexible Framework**: - TFGAN can be applied as a plug-in module to other architectures, such as text feature learning based on Convolutional Autoencoders (AEs). With these improvements, TFGAN can be trained efficiently and stably, generating high-quality and diverse text.