Abstract:Human values and their measurement are long-standing interdisciplinary inquiry. Recent advances in AI have sparked renewed interest in this area, with large language models (LLMs) emerging as both tools and subjects of value measurement. This work introduces Generative Psychometrics for Values (GPV), an LLM-based, data-driven value measurement paradigm, theoretically grounded in text-revealed selective perceptions. We begin by fine-tuning an LLM for accurate perception-level value measurement and verifying the capability of LLMs to parse texts into perceptions, forming the core of the GPV pipeline. Applying GPV to human-authored blogs, we demonstrate its stability, validity, and superiority over prior psychological tools. Then, extending GPV to LLM value measurement, we advance the current art with 1) a psychometric methodology that measures LLM values based on their scalable and free-form outputs, enabling context-specific measurement; 2) a comparative analysis of measurement paradigms, indicating response biases of prior methods; and 3) an attempt to bridge LLM values and their safety, revealing the predictive power of different value systems and the impacts of various values on LLM safety. Through interdisciplinary efforts, we aim to leverage AI for next-generation psychometrics and psychometrics for value-aligned AI.

What problem does this paper attempt to address?

The paper aims to address the following issues: 1. **Measurement of Human Values**: Traditional psychological measurement tools (such as self-report questionnaires) have issues like response bias, high resource demands, and difficulty in capturing real behavior. This paper proposes a generative psychometric method (GPV) based on large language models (LLMs) to measure personal values by analyzing perceived information in text. 2. **Measurement of LLMs' Values**: As LLMs become more prevalent in public applications, it is crucial to reliably measure their values. Existing measurement methods (such as self-report questionnaires) are not fully applicable to LLMs and have static, non-scalable issues. GPV addresses these problems by dynamically generating perceived information and enabling context-relevant measurement based on LLMs' outputs. The paper demonstrates the effectiveness and superiority of GPV through the following specific steps: - **Model Training and Validation**: Fine-tuning the Llama 3 model to achieve perception-level value measurement. Experimental results show that this model outperforms other advanced models in perception relevance and tendency classification. - **Application to Human Blog Data**: Analyzing 791 blog posts to validate GPV's performance in terms of stability, construct validity, concurrent validity, and predictive validity, showing it to be superior to traditional tools. - **Measurement of LLMs' Values**: Evaluating 17 LLMs under four different value theories, the results show that GPV significantly outperforms existing tools in construct validity and reveals the impact of different value systems on the safety of LLMs. In summary, the paper aims to improve the accuracy and flexibility of measuring human and LLMs' values by introducing a new LLM-driven psychometric method.

Measuring Human and AI Values based on Generative Psychometrics with Large Language Models

Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches

ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models

ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models

CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility

High-Dimension Human Value Representation in Large Language Models

CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses

Measuring Spiritual Values and Bias of Large Language Models

An Overview of Self-Commutating Converters and Their Application in Transmission and Distribution

Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing

From Values to Opinions: Predicting Human Behaviors and Stances Using Value-Injected Large Language Models

Evaluating and Improving Value Judgments in AI: A Scenario-Based Study on Large Language Models' Depiction of Social Conventions

AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories

Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI

Applying and Evaluating Large Language Models in Mental Health Care: A Scoping Review of Human-Assessed Generative Tasks

Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values

Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Value

Using large language models in psychology

Large Language Models as Superpositions of Cultural Perspectives