Identifying Multiple Personalities in Large Language Models with External Evaluation

Xiaoyang Song,Yuta Adachi,Jessie Feng,Mouwei Lin,Linhao Yu,Frank Li,Akshat Gupta,Gopala Anumanchipalli,Simerjot Kaur
2024-02-23
Abstract:As Large Language Models (LLMs) are integrated with human daily applications rapidly, many societal and ethical concerns are raised regarding the behavior of LLMs. One of the ways to comprehend LLMs' behavior is to analyze their personalities. Many recent studies quantify LLMs' personalities using self-assessment tests that are created for humans. Yet many critiques question the applicability and reliability of these self-assessment tests when applied to LLMs. In this paper, we investigate LLM personalities using an alternate personality measurement method, which we refer to as the external evaluation method, where instead of prompting LLMs with multiple-choice questions in the Likert scale, we evaluate LLMs' personalities by analyzing their responses toward open-ended situational questions using an external machine learning model. We first fine-tuned a Llama2-7B model as the MBTI personality predictor that outperforms the state-of-the-art models as the tool to analyze LLMs' responses. Then, we prompt the LLMs with situational questions and ask them to generate Twitter posts and comments, respectively, in order to assess their personalities when playing two different roles. Using the external personality evaluation method, we identify that the obtained personality types for LLMs are significantly different when generating posts versus comments, whereas humans show a consistent personality profile in these two different situations. This shows that LLMs can exhibit different personalities based on different scenarios, thus highlighting a fundamental difference between personality in LLMs and humans. With our work, we call for a re-evaluation of personality definition and measurement in LLMs.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the issue of personality expression in Large Language Models (LLMs) under different roles and the effectiveness of their measurement methods. Specifically: 1. **Alternative Personality Measurement Method**: The paper proposes a new external assessment method to measure the personality traits of LLMs, as an alternative to traditional self-assessment tests. This method evaluates LLMs by analyzing their responses to open-ended situational questions and using external machine learning models for assessment. 2. **Validation of the Model's Effectiveness**: Researchers first fine-tuned a Llama2-7B model as an MBTI personality prediction tool and used it to analyze the responses of different LLMs when playing different roles. 3. **Comparison of Human and LLM Personality Consistency**: The study found that when LLMs generate tweets and comments, they exhibit distinctly different personality types, whereas humans maintain consistent personality types in both contexts. This suggests that LLM personalities may lack persistence, which is inconsistent with the "persistent traits" defined in human personality. 4. **Redefining the Concept of LLM Personality**: Based on the above findings, the paper calls for a reevaluation of the definition and measurement methods of LLM personality, pointing out that current research methods for human personality may not be applicable to LLMs. Through these experiments, the authors emphasize that the current standards for measuring human personality are not fully applicable to LLMs and suggest the need to develop new theoretical frameworks to better understand and describe the personality traits of LLMs.