Abstract:Large language models are increasingly being used to label or rate psychological features in text data. This approach helps address one of the limiting factors of digital trace data - their lack of an inherent target of measurement. However, this approach is also a form of psychological measurement (using observable variables to quantify a hypothetical latent construct). As such, these ratings are subject to the same psychometric considerations of reliability and validity as more standard psychological measures. Here we present a workflow for developing and evaluating large language model based measures of psychological features which incorporate these considerations. We also provide an example, attempting to measure the previously established constructs of attitude certainty, importance and moralization from text. Using a pool of prompts adapted from existing measurement instruments, we find they have good levels of internal consistency but only partially meet validity criteria.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the methodological challenges faced when using large - language models (LLMs) to measure psychological characteristics from text data. Specifically, digital trace data (such as social media posts) usually lack an intrinsic target measurement value, which means that these data themselves do not clearly point to any specific psychological process or variable. This poses a challenge to research in computational social science because researchers need to be able to extract meaningful psychological measurement results from these data. The paper proposes that by adopting the principles and methods of psychometrics, large - language models can be effectively used to solve this problem. Specifically, the author proposes the following key points: 1. **Clear Objectives**: First, it is necessary to clarify what specific psychological construct is to be measured, including its nature and what other factors it should be related to. 2. **Multi - Prompt Evaluation**: In order to improve the reliability of measurement, for each psychological construct to be evaluated, multiple different prompts (i.e., questions or instructions) should be used. These prompts can be adapted from existing self - report scales to provide different formulations of the same basic question. 3. **Internal Consistency Evaluation**: When evaluating model performance, it is first necessary to ensure that the model can consistently measure a certain latent construct, which can be achieved by evaluating the internal consistency among different prompts. 4. **Validity Verification**: Although it is important to compare model ratings with external evaluation criteria, these criteria should not be regarded as absolute true values. Instead, they are possible related factors of the latent construct. Verifying the validity of the model means proving that the model ratings can predict the expected outcome variables. Through the above methods, the paper aims to establish a systematic workflow to guide how to develop and evaluate psychological measurement tools based on large - language models, thereby improving the reliability and validity of extracting psychological information from text data.

From traces to measures: Large language models as a tool for psychological measurement from text

Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models

AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories

Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment

Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales

Measuring Latent Trust Patterns in Large Language Models in the Context of Human-AI Teaming

Rediscovering the Latent Dimensions of Personality with Large Language Models as Trait Descriptors

Revisiting the Reliability of Psychological Scales on Large Language Models

Using large language models in psychology

Cognitive phantoms in LLMs through the lens of latent variables

Personality Traits in Large Language Models

Is Machine Psychology here? On Requirements for Using Human Psychological Tests on Large Language Models

Enhancing health assessments with large language models: A methodological approach

Eliciting Big Five Personality Traits in Large Language Models: A Textual Analysis with Classifier-Driven Approach

Perils and opportunities in using large language models in psychological research

Large Language Models respond to Influence like Humans

Large Language Models Can Infer Psychological Dispositions of Social Media Users

Questioning the Survey Responses of Large Language Models

The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

Large language models as linguistic simulators and cognitive models in human research