Abstract:Very large language models (LLMs) perform extremely well on a spectrum of NLP tasks in a zero-shot setting. However, little is known about their performance on human-level NLP problems which rely on understanding psychological concepts, such as assessing personality traits. In this work, we investigate the zero-shot ability of GPT-3 to estimate the Big 5 personality traits from users' social media posts. Through a set of systematic experiments, we find that zero-shot GPT-3 performance is somewhat close to an existing pre-trained SotA for broad classification upon injecting knowledge about the trait in the prompts. However, when prompted to provide fine-grained classification, its performance drops to close to a simple most frequent class (MFC) baseline. We further analyze where GPT-3 performs better, as well as worse, than a pretrained lexical model, illustrating systematic errors that suggest ways to improve LLMs on human-level NLP tasks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the ability of large - language models (LLMs), especially GPT - 3, to estimate personality traits in a zero - sample setting. Specifically, the researchers are concerned with whether GPT - 3 can estimate the "Big Five" personality traits (openness, conscientiousness, extraversion, agreeableness, and neuroticism) from users' social media posts. Through a series of systematic experiments, they explored how injecting different types of knowledge about personality traits (such as definitions, lists of related words, and questionnaire item descriptions) into the prompts affects the performance of GPT - 3. The main contributions of the paper are: 1. It explored what information about personality is useful for GPT - 3. 2. It compared the performance of GPT - 3 with the current state - of - the - art methods (such as dictionary - based methods) in estimating personality traits. 3. It analyzed the relationship between the orderliness of result labels and the model performance. 4. It examined whether GPT - 3's predictions remain consistent when similar external knowledge is provided. The study found that when the task is simplified to a binary - classification problem, GPT - 3 performs relatively well; when the task becomes a more fine - grained three - classification problem, its performance drops significantly. In addition, the study also pointed out that GPT - 3 performs better on some specific personality traits, especially when using questionnaire item descriptions (ITEMDESC) as input. However, overall, the average performance of GPT - 3 in the zero - sample setting is still lower than that of highly - trained supervised models (such as WT - LEX). These findings help to understand the capabilities and limitations of LLMs in handling human - level natural language processing tasks and provide directions for improvement in future research.

Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation

Large Language Models Can Infer Psychological Dispositions of Social Media Users

A procedure for the strategic planning of locations, capacities and districting of jails: application to Chile

PersonaLLM: Investigating the Ability of GPT-3.5 to Express Personality Traits and Gender Differences

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Can ChatGPT Assess Human Personalities? A General Evaluation Framework

Challenging the Validity of Personality Tests for Large Language Models

Artificial Intelligence and Personality: Large Language Models’ Ability to Predict Personality Type

Using cognitive psychology to understand GPT-3

Humanity in AI: Detecting the Personality of Large Language Models

Large Language Models Can Infer Personality from Free-Form User Interactions

Can Large Language Models Assess Personality from Asynchronous Video Interviews? A Comprehensive Evaluation of Validity, Reliability, Fairness, and Rating Patterns

Large language models know how the personality of public figures is perceived by the general public

Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis

PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for Personality Detection

Revisiting the Reliability of Psychological Scales on Large Language Models

Large language models and humans converge in judging public figures' personalities

Dynamic Generation of Personalities with Large Language Models

Identifying and Manipulating the Personality Traits of Language Models

Is ChatGPT a Good Personality Recognizer? A Preliminary Study

Personality Traits in Large Language Models