Fixed global memory for controllable long text generation

Zheng Chen,Zhejun Liu
DOI: https://doi.org/10.1007/s10489-022-04197-6
IF: 5.3
2022-10-21
Applied Intelligence
Abstract:Long text generation is a challenging yet unsolved task. To generate long, coherent, and consistent text, existing approaches need to increase the language model length accordingly. However, the cost of the computational and memory resources grows as the square of the length. Even trained with thousands of GPUs, the length of language models is still limited to a few thousand, which may cause the generation of longer texts to be inconsistent with the topics and ideas in their preceding texts. To address this, we propose a novel Transformer architecture called Transformer with Local and Global Memory (Transformer LGM). It is inspired by the way people write long articles, which generate a key idea first and then guide the writing of the entire article with the idea in mind. Such a "key idea" can be put into the fixed global memory of the Transformer LGM to guide the whole generation process. On the contrary, the local memory, which is responsible for local coherence, could shift and drop with the increasing length of the generated text. We implement the global memory by introducing a negative positional embedding, while the traditional positive positional embedding is still used for the local memory. Experiments show that by utilizing the global memory, our model could generate long, coherent, and consistent text without enlarging the length of the language model.
computer science, artificial intelligence
What problem does this paper attempt to address?