Abstract:Large Language Models (LLMs) demonstrate increasingly human-like abilities across a wide variety of tasks. In this paper, we investigate whether LLMs like ChatGPT can accurately infer the psychological dispositions of social media users and whether their ability to do so varies across socio-demographic groups. Specifically, we test whether GPT-3.5 and GPT-4 can derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario. Our results show an average correlation of r = .29 (range = [.22, .33]) between LLM-inferred and self-reported trait scores - a level of accuracy that is similar to that of supervised machine learning models specifically trained to infer personality. Our findings also highlight heterogeneity in the accuracy of personality inferences across different age groups and gender categories: predictions were found to be more accurate for women and younger individuals on several traits, suggesting a potential bias stemming from the underlying training data or differences in online self-expression. The ability of LLMs to infer psychological dispositions from user-generated text has the potential to democratize access to cheap and scalable psychometric assessments for both researchers and practitioners. On the one hand, this democratization might facilitate large-scale research of high ecological validity and spark innovation in personalized services. On the other hand, it also raises ethical concerns regarding user privacy and self-determination, highlighting the need for stringent ethical frameworks and regulation.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to evaluate whether large - language models (LLMs), such as ChatGPT, can accurately infer the psychological characteristics of social media users from their posts without special training (i.e., zero - sample learning), especially the Big Five personality traits (openness, conscientiousness, extraversion, agreeableness, and neuroticism). In addition, the study also explores whether the accuracy of these inferences varies among different sociodemographic groups (such as gender and age), and whether such differences reflect potential biases. The paper addresses these issues through the following methods: 1. **Data and Sampling**: The study is based on text data obtained from the MyPersonality application, which allows users to complete real psychological measurement tests and donate their Facebook profile information for research. The data of 1,000 adult users were randomly selected for analysis. 2. **Measurement Tools**: The International Personality Item Pool (IPIP) scale was used to measure the Big Five personality traits of users, and ChatGPT (GPT - 3.5 and GPT - 4 versions) was used to score users' Facebook status updates to infer their personality traits. 3. **Result Analysis**: The accuracy of LLMs in inferring personality traits was evaluated by comparing the correlation between self - rated scores and LLM - inferred scores. The study also analyzed the differences in inferred scores among users of different genders and age groups and their residuals to explore potential biases. The study found that LLMs can infer users' personality traits with a moderate degree of accuracy in the zero - sample learning scenario, especially more accurately for women and young individuals. However, the study also found that LLMs underestimate or overestimate when inferring certain personality traits, which may reflect biases in the training data or differences in online self - expression among different groups. These findings are not only of theoretical significance, but also provide new possibilities for the development of personalized services, while also raising ethical concerns about user privacy and autonomy.

Large Language Models Can Infer Psychological Dispositions of Social Media Users

A procedure for the strategic planning of locations, capacities and districting of jails: application to Chile

Large Language Models Can Infer Personality from Free-Form User Interactions

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Artificial Intelligence and Personality: Large Language Models’ Ability to Predict Personality Type

PersonaLLM: Investigating the Ability of GPT-3.5 to Express Personality Traits and Gender Differences

Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis

Perils and opportunities in using large language models in psychological research

Can large language models help predict results from a complex behavioural science study?

Humanity in AI: Detecting the Personality of Large Language Models

AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories

Personality Traits in Large Language Models

Large Language Models Show Human-like Social Desirability Biases in Survey Responses

Can ChatGPT Assess Human Personalities? A General Evaluation Framework

Personality testing of Large Language Models: Limited temporal stability, but highlighted prosociality

ChatGPT vs Social Surveys: Probing the Objective and Subjective Human Society

Revisiting the Reliability of Psychological Scales on Large Language Models

Using large language models in psychology

GPT is an effective tool for multilingual psychological text analysis

Challenging the Validity of Personality Tests for Large Language Models

The Cultural Psychology of Large Language Models: Is ChatGPT a Holistic or Analytic Thinker?