From traces to measures: Large language models as a tool for psychological measurement from text

Joseph J.P. Simons,Wong Liang Ze,Prasanta Bhattacharya,Brandon Siyuan Loh,Wei Gao
2024-10-14
Abstract:Large language models are increasingly being used to label or rate psychological features in text data. This approach helps address one of the limiting factors of digital trace data - their lack of an inherent target of measurement. However, this approach is also a form of psychological measurement (using observable variables to quantify a hypothetical latent construct). As such, these ratings are subject to the same psychometric considerations of reliability and validity as more standard psychological measures. Here we present a workflow for developing and evaluating large language model based measures of psychological features which incorporate these considerations. We also provide an example, attempting to measure the previously established constructs of attitude certainty, importance and moralization from text. Using a pool of prompts adapted from existing measurement instruments, we find they have good levels of internal consistency but only partially meet validity criteria.
Human-Computer Interaction
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the methodological challenges faced when using large - language models (LLMs) to measure psychological characteristics from text data. Specifically, digital trace data (such as social media posts) usually lack an intrinsic target measurement value, which means that these data themselves do not clearly point to any specific psychological process or variable. This poses a challenge to research in computational social science because researchers need to be able to extract meaningful psychological measurement results from these data. The paper proposes that by adopting the principles and methods of psychometrics, large - language models can be effectively used to solve this problem. Specifically, the author proposes the following key points: 1. **Clear Objectives**: First, it is necessary to clarify what specific psychological construct is to be measured, including its nature and what other factors it should be related to. 2. **Multi - Prompt Evaluation**: In order to improve the reliability of measurement, for each psychological construct to be evaluated, multiple different prompts (i.e., questions or instructions) should be used. These prompts can be adapted from existing self - report scales to provide different formulations of the same basic question. 3. **Internal Consistency Evaluation**: When evaluating model performance, it is first necessary to ensure that the model can consistently measure a certain latent construct, which can be achieved by evaluating the internal consistency among different prompts. 4. **Validity Verification**: Although it is important to compare model ratings with external evaluation criteria, these criteria should not be regarded as absolute true values. Instead, they are possible related factors of the latent construct. Verifying the validity of the model means proving that the model ratings can predict the expected outcome variables. Through the above methods, the paper aims to establish a systematic workflow to guide how to develop and evaluate psychological measurement tools based on large - language models, thereby improving the reliability and validity of extracting psychological information from text data.