Cognitive phantoms in LLMs through the lens of latent variables

Sanne Peereboom,Inga Schwabe,Bennett Kleinberg
2024-09-06
Abstract:Large language models (LLMs) increasingly reach real-world applications, necessitating a better understanding of their behaviour. Their size and complexity complicate traditional assessment methods, causing the emergence of alternative approaches inspired by the field of psychology. Recent studies administering psychometric questionnaires to LLMs report human-like traits in LLMs, potentially influencing LLM behaviour. However, this approach suffers from a validity problem: it presupposes that these traits exist in LLMs and that they are measurable with tools designed for humans. Typical procedures rarely acknowledge the validity problem in LLMs, comparing and interpreting average LLM scores. This study investigates this problem by comparing latent structures of personality between humans and three LLMs using two validated personality questionnaires. Findings suggest that questionnaires designed for humans do not validly measure similar constructs in LLMs, and that these constructs may not exist in LLMs at all, highlighting the need for psychometric analyses of LLM responses to avoid chasing cognitive phantoms. Keywords: large language models, psychometrics, machine behaviour, latent variable modeling, validity
Artificial Intelligence,Human-Computer Interaction
What problem does this paper attempt to address?
The paper aims to explore the effectiveness and potential issues of large language models (LLMs) in behavioral research, particularly in the use of psychometric questionnaires. Specifically, the paper attempts to address the following key issues: 1. **Validity of Psychometric Questionnaires**: Current research often directly applies psychometric questionnaires designed for humans to LLMs and infers the behavioral characteristics of LLMs through the scores of these questionnaires. However, this approach has a fundamental problem: it assumes that these psychological traits indeed exist in LLMs and can be accurately measured by tools designed for humans. 2. **Differences in Latent Variable Structures**: By comparing the latent variable structures of human samples with those of three LLMs on two validated personality questionnaires, the paper finds that questionnaires designed by humans cannot effectively measure similar psychological constructs in LLMs, and these constructs may not even exist in LLMs. 3. **Validation of Measurement Tools**: The paper emphasizes the need for rigorous validation of psychometric questionnaires before applying them to LLMs to avoid "chasing cognitive phantoms." That is, without a reasonable theoretical basis, meaningful conclusions cannot be drawn solely based on questionnaire scores. In summary, the core of the paper is to question the validity of existing methodologies and to call for a more rigorous methodological perspective in the behavioral research of LLMs, particularly through latent variable analysis to validate the applicability of psychometric tools in LLMs.