Can ChatGPT Assess Human Personalities? A General Evaluation Framework

Haocong Rao,Cyril Leung,Chunyan Miao
2023-10-13
Abstract:Large Language Models (LLMs) especially ChatGPT have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored. Existing works study the virtual personalities of LLMs but rarely explore the possibility of analyzing human personalities via LLMs. This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests. Specifically, we first devise unbiased prompts by randomly permuting options in MBTI questions and adopt the average testing result to encourage more impartial answer generation. Then, we propose to replace the subject in question statements to enable flexible queries and assessments on different subjects from LLMs. Finally, we re-formulate the question instructions in a manner of correctness evaluation to facilitate LLMs to generate clearer responses. The proposed framework enables LLMs to flexibly assess personalities of different groups of people. We further propose three evaluation metrics to measure the consistency, robustness, and fairness of assessment results from state-of-the-art LLMs including ChatGPT and GPT-4. Our experiments reveal ChatGPT's ability to assess human personalities, and the average results demonstrate that it can achieve more consistent and fairer assessments in spite of lower robustness against prompt biases compared with InstructGPT.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to explore whether large - language models (LLMs) can evaluate human personality traits. Specifically, existing research mainly focuses on analyzing the virtual personalities or psychological characteristics of LLMs themselves, but few have explored whether LLMs can be used to analyze human personalities. This open - ended question is the key to verifying the psychological analysis ability of LLMs and can also reveal their potential for understanding humans, that is, "How do LLMs view humans?". By allowing LLMs to evaluate human personalities based on the Myers - Briggs Type Indicator (MBTI) test, this study aims to: 1. **Understand LLMs' views on humans**: By evaluating the personalities of different groups of humans, we can better understand the possible response motives and communication patterns of LLMs. 2. **Detect biases**: Help reveal whether LLMs have biases against certain groups of people, thereby optimizing the model to generate more equitable content. 3. **Identify ethical and social risks**: Discover the possible ethical and social risks (such as spreading misinformation) in the application of LLMs, which will affect their reliability and safety, and thus promote the development of more trustworthy and human - friendly LLMs. To achieve the above goals, the paper proposes a general evaluation framework, which includes three key components: - **Unbiased Prompts**: By randomly arranging the options in MBTI questions and using the average results of multiple independent tests to obtain more consistent and fair answers. - **Subject - Replaced Query**: Replace the original subject (such as "you") in the question statement with the target subject (such as "male", "barber") to achieve flexible query and evaluation of specific subjects. - **Correctness - Evaluated Instruction**: Reformulate the question instructions so that LLMs can analyze the correctness of the question statement, thereby obtaining clearer responses. In addition, the paper also proposes three quantitative evaluation indicators to measure the consistency, robustness, and fairness of LLMs in evaluating human personalities. The experimental results show that ChatGPT and GPT - 4 exhibit higher consistency and fairness in evaluating human personalities, although their results are more sensitive to prompt biases. These findings provide valuable insights for future psychological, sociological, and governance research on LLMs.