ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models

Haotian Luo,Kunming Wu,Cheng Dai,Sixian Ding,Xinhao Chen
2023-11-03
Abstract:RNN-like language models are getting renewed attention from NLP researchers in recent years and several models have made significant progress, which demonstrates performance comparable to traditional transformers. However, due to the recurrent nature of RNNs, this kind of language model can only store information in a set of fixed-length state vectors. As a consequence, they still suffer from forgetfulness though after a lot of improvements and optimizations, when given complex instructions or prompts. As the prompted generation is the main and most concerned function of LMs, solving the problem of forgetting in the process of generation is no wonder of vital importance. In this paper, focusing on easing the prompt forgetting during generation, we proposed an architecture to teach the model memorizing prompt during generation by synthetic gradient. To force the model to memorize the prompt, we derive the states that encode the prompt, then transform it into model parameter modification using low-rank gradient approximation, which hard-codes the prompt into model parameters temporarily. We construct a dataset for experiments, and the results have demonstrated the effectiveness of our method in solving the problem of forgetfulness in the process of prompted generation. We will release all the code upon acceptance.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the phenomenon of language models forgetting prompts during the generation process. Specifically, for language models of the Recurrent Neural Network (RNN) type, due to their inherent recursive nature, these models can only store information in a set of state vectors of a fixed length. Therefore, when faced with complex instructions or prompts, such models still have the problem of forgetting. Especially in the case of multi - stage prompts, that is, when the prompt contains multiple sub - instructions or processes, the model may forget the requirements of the subsequent parts of the prompt after generating a relatively long content. For example, for the prompt "Write a story about Tom, and this story should have a tragic ending", the model may forget to write a tragic ending after generating a long piece of content. This phenomenon is called "prompt forgetting". To alleviate this problem, the author proposes an architecture named ProSG (Prompt Synthetic Gradient). Through the synthetic gradient technique, it temporarily encodes the prompt information into the model parameters, thereby enhancing the generation quality and reducing the phenomenon of prompt forgetting. This method calculates the gradient of the prompt and uses the low - rank gradient approximation technique to convert it into a modification of the model parameters, thereby forcing the model to remember the prompt information during the generation process. Experimental results show that this method effectively solves the problem of prompt forgetting and improves the performance of the model in multi - stage prompt tasks.