Personalized LLM Response Generation with Parameterized Memory Injection

Kai Zhang,Lizhi Qing,Yangyang Kang,Xiaozhong Liu
2024-06-11
Abstract:Large Language Models (LLMs) have exhibited remarkable proficiency in comprehending and generating natural language. On the other hand, personalized LLM response generation holds the potential to offer substantial benefits for individuals in critical areas such as medical. Existing research has explored memory-augmented methods to prompt the LLM with pre-stored user-specific knowledge for personalized response generation in terms of new queries. We contend that such paradigm is unable to perceive fine-granularity information. In this study, we propose a novel \textbf{M}emory-\textbf{i}njected approach using parameter-efficient fine-tuning (PEFT) and along with a Bayesian Optimisation searching strategy to achieve \textbf{L}LM \textbf{P}ersonalization(\textbf{MiLP}).
Computation and Language
What problem does this paper attempt to address?
This paper mainly discusses how to effectively integrate user information into large-scale language models (LLMs) to achieve personalized response generation. Existing methods have limitations, such as text prompting being limited by the long context window of LLMs, and memory augmentation methods may not capture fine-grained information. Inspired by the biological memory mechanism, a parameterized memory injection method (MiLP) is proposed, combining parameter efficiency fine-tuning (PEFT) and Bayesian optimization search strategy to achieve personalization of LLMs. MiLP utilizes a feed-forward layer (FFL) in neural networks to simulate the memory mechanism of the real world and store and activate user information. It is inserted into the FFL of LLMs through the LoRA module and uses Bayesian optimization to determine the optimal configuration for storing and activating different memories. The paper also points out that different memories have different sensitivity to parameter budget and injection layer position, so multiple LoRA modules and high-dimensional multi-objective Bayesian optimization are needed to determine the optimal configuration. Experimental results show that MiLP significantly improves performance compared to baseline methods (including text prompting, memory augmentation, and user embedding methods) on three datasets, verifying its effectiveness and superiority. In addition, the paper conducts quality research and ablation studies, demonstrating the necessity of the MiLP component, as well as the advantages of combining memory injection and instruction fine-tuning. Future work may explore larger user bases and larger-scale LLMs, as well as improve the reasoning ability to understand user-specific needs.