Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs

Xiaoyang Song,Akshat Gupta,Kiyan Mohebbizadeh,Shujie Hu,Anant Singh
DOI: https://doi.org/10.48550/arXiv.2305.14693
2023-05-24
Abstract:Have Large Language Models (LLMs) developed a personality? The short answer is a resounding "We Don't Know!". In this paper, we show that we do not yet have the right tools to measure personality in language models. Personality is an important characteristic that influences behavior. As LLMs emulate human-like intelligence and performance in various tasks, a natural question to ask is whether these models have developed a personality. Previous works have evaluated machine personality through self-assessment personality tests, which are a set of multiple-choice questions created to evaluate personality in humans. A fundamental assumption here is that human personality tests can accurately measure personality in machines. In this paper, we investigate the emergence of personality in five LLMs of different sizes ranging from 1.5B to 30B. We propose the Option-Order Symmetry property as a necessary condition for the reliability of these self-assessment tests. Under this condition, the answer to self-assessment questions is invariant to the order in which the options are presented. We find that many LLMs personality test responses do not preserve option-order symmetry. We take a deeper look at LLMs test responses where option-order symmetry is preserved to find that in these cases, LLMs do not take into account the situational statement being tested and produce the exact same answer irrespective of the situation being tested. We also identify the existence of inherent biases in these LLMs which is the root cause of the aforementioned phenomenon and makes self-assessment tests unreliable. These observations indicate that self-assessment tests are not the correct tools to measure personality in LLMs. Through this paper, we hope to draw attention to the shortcomings of current literature in measuring personality in LLMs and call for developing tools for machine personality measurement.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: Have large - language models (LLMs) developed personalities? If so, how can we measure the personalities of these models? Specifically, the paper explores the following issues: 1. **Applicability of personality measurement tools**: Are the existing self - assessment tests used to measure human personalities applicable to measuring the personalities of large - language models? 2. **Definition of personality**: In the research, personality is defined as a consistent pattern of behavior exhibited in different situations. The paper focuses on the behavior patterns of these models in the real world, rather than whether they possess human emotions, self - awareness or consciousness. 3. **Key attributes of test reliability**: The paper proposes "Option - Order Symmetry" as a necessary condition for the reliability of self - assessment tests. This property requires that, for the same question, regardless of how the order of options changes, the model's response should remain unchanged. 4. **Limitations of existing methods**: The paper finds that many large - language models do not satisfy option - order symmetry when answering self - assessment tests, which makes the results of these tests unreliable. In addition, models often do not consider specific situations when answering, but give the same answers, further indicating the ineffectiveness of these tests in measuring model personalities. 5. **Inherent biases**: The paper also identifies the inherent biases present in these language models. These biases cause the models to show consistent preferences in certain choices, thus affecting the reliability of the tests. Through these studies, the paper hopes to draw the attention of the academic community to the deficiencies of current methods for measuring the personalities of large - language models and calls for the development of more specific tools to measure machine personalities.