What makes your model a low-empathy or warmth person: Exploring the Origins of Personality in LLMs

Shu Yang,Shenzhe Zhu,Ruoxuan Bao,Liang Liu,Yu Cheng,Lijie Hu,Mengdi Li,Di Wang
2024-10-08
Abstract:Large language models (LLMs) have demonstrated remarkable capabilities in generating human-like text and exhibiting personality traits similar to those in humans. However, the mechanisms by which LLMs encode and express traits such as agreeableness and impulsiveness remain poorly understood. Drawing on the theory of social determinism, we investigate how long-term background factors, such as family environment and cultural norms, interact with short-term pressures like external instructions, shaping and influencing LLMs' personality traits. By steering the output of LLMs through the utilization of interpretable features within the model, we explore how these background and pressure factors lead to changes in the model's traits without the need for further fine-tuning. Additionally, we suggest the potential impact of these factors on model safety from the perspective of personality.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the following issues: Large Language Models (LLMs) have demonstrated the ability to generate human-like text and exhibit human-like personality traits, such as agreeableness and impulsivity. However, the mechanisms by which these models encode and express these personality traits remain unclear. Based on the theory of social determinism, the paper explores how long-term background factors (such as family environment and cultural norms) and short-term pressures (such as external instructions) interact to shape and influence the personality traits of LLMs. Additionally, the study investigates how these factors guide changes in model outputs through interpretable features without further fine-tuning. Finally, the paper discusses the impact of these factors on model safety from a personality perspective. Specifically, the paper focuses on the following two core questions: 1. How do long-term background factors and short-term pressures shape and influence the personality traits of LLMs? Why do LLMs exhibit personality traits similar to low empathy or warmth? 2. How do these personality traits affect the safety of LLMs? For example, does higher agreeableness make LLMs more susceptible to jailbreak attacks? By exploring these questions, the paper aims to reveal the mechanisms behind the formation of LLMs' personality traits and propose methods to control and adjust these traits to improve the model's safety and reliability.