Editing Personality for Large Language Models

Shengyu Mao,Xiaohan Wang,Mengru Wang,Yong Jiang,Pengjun Xie,Fei Huang,Ningyu Zhang
2024-09-01
Abstract:This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs). This task seeks to adjust the models' responses to opinion-related questions on specified topics since an individual's personality often manifests in the form of their expressed opinions, thereby showcasing different personality traits. Specifically, we construct PersonalityEdit, a new benchmark dataset to address this task. Drawing on the theory in Social Psychology, we isolate three representative traits, namely Neuroticism, Extraversion, and Agreeableness, as the foundation for our benchmark. We then gather data using GPT-4, generating responses that align with a specified topic and embody the targeted personality trait. We conduct comprehensive experiments involving various baselines and discuss the representation of personality behavior in LLMs. Our findings uncover potential challenges of the proposed task, illustrating several remaining issues. We anticipate that our work can stimulate further annotation in model editing and personality-related research. Code is available at <a class="link-external link-https" href="https://github.com/zjunlp/EasyEdit" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence,Computers and Society,Machine Learning,Multiagent Systems
What problem does this paper attempt to address?
The problem this paper attempts to address is how to edit the personality traits of large language models (LLMs). Specifically, the authors propose a new task aimed at adjusting the way LLMs express opinions on specific topics to exhibit different personality traits. To achieve this goal, the authors constructed a new benchmark dataset called PersonalityEdit and selected Neuroticism, Extraversion, and Agreeableness as the foundational personality traits for the study. By utilizing GPT-4 to generate responses that align with specific personality traits, the authors conducted comprehensive experiments on various model editing methods and discussed the manifestation of personality behaviors in LLMs. The main contributions of this work include: 1. For the first time, exploring the challenge of editing the personality traits of LLMs and proposing a benchmark dataset, PersonalityEdit. 2. Using GPT-4 for topic-constrained and personality trait-guided data generation, combined with automatic and manual validation to ensure data quality. 3. Proposing several metrics for evaluating personality traits in generated texts, analyzing different baseline methods, and finding that while existing methods can promote personality editing to some extent, the results are still unsatisfactory, highlighting the difficulty of the task.