Abstract:RNN-like language models are getting renewed attention from NLP researchers in recent years and several models have made significant progress, which demonstrates performance comparable to traditional transformers. However, due to the recurrent nature of RNNs, this kind of language model can only store information in a set of fixed-length state vectors. As a consequence, they still suffer from forgetfulness though after a lot of improvements and optimizations, when given complex instructions or prompts. As the prompted generation is the main and most concerned function of LMs, solving the problem of forgetting in the process of generation is no wonder of vital importance. In this paper, focusing on easing the prompt forgetting during generation, we proposed an architecture to teach the model memorizing prompt during generation by synthetic gradient. To force the model to memorize the prompt, we derive the states that encode the prompt, then transform it into model parameter modification using low-rank gradient approximation, which hard-codes the prompt into model parameters temporarily. We construct a dataset for experiments, and the results have demonstrated the effectiveness of our method in solving the problem of forgetfulness in the process of prompted generation. We will release all the code upon acceptance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the phenomenon of language models forgetting prompts during the generation process. Specifically, for language models of the Recurrent Neural Network (RNN) type, due to their inherent recursive nature, these models can only store information in a set of state vectors of a fixed length. Therefore, when faced with complex instructions or prompts, such models still have the problem of forgetting. Especially in the case of multi - stage prompts, that is, when the prompt contains multiple sub - instructions or processes, the model may forget the requirements of the subsequent parts of the prompt after generating a relatively long content. For example, for the prompt "Write a story about Tom, and this story should have a tragic ending", the model may forget to write a tragic ending after generating a long piece of content. This phenomenon is called "prompt forgetting". To alleviate this problem, the author proposes an architecture named ProSG (Prompt Synthetic Gradient). Through the synthetic gradient technique, it temporarily encodes the prompt information into the model parameters, thereby enhancing the generation quality and reducing the phenomenon of prompt forgetting. This method calculates the gradient of the prompt and uses the low - rank gradient approximation technique to convert it into a modification of the model parameters, thereby forcing the model to remember the prompt information during the generation process. Experimental results show that this method effectively solves the problem of prompt forgetting and improves the performance of the model in multi - stage prompt tasks.

ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models

RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning

Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning

Generating Prompts in Latent Space for Rehearsal-free Continual Learning

Set-Based Prompting: Provably Solving the Language Model Order Dependency Problem

Progressive Prompts: Continual Learning for Language Models

Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models

Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis

Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning

Prompt-aligned Gradient for Prompt Tuning

Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning

Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers

Mixture of Experts Meets Prompt-Based Continual Learning

On Conditional and Compositional Language Model Differentiable Prompting

Controllable Generation from Pre-trained Language Models via Inverse Prompting

Automatic Prompt Optimization with "Gradient Descent" and Beam Search

S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning

Instance-aware Prompt Learning for Language Understanding and Generation

Prompt2Model: Generating Deployable Models from Natural Language Instructions