Abstract:Recent advancements in Large Language Models (LLMs) have revolutionized the AI field but also pose potential safety and ethical risks. Deciphering LLMs' embedded values becomes crucial for assessing and mitigating their risks. Despite extensive investigation into LLMs' values, previous studies heavily rely on human-oriented value systems in social sciences. Then, a natural question arises: Do LLMs possess unique values beyond those of humans? Delving into it, this work proposes a novel framework, ValueLex, to reconstruct LLMs' unique value system from scratch, leveraging psychological methodologies from human personality/value research. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs, synthesizing a taxonomy that culminates in a comprehensive value framework via factor analysis and semantic clustering. We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system. Based on this system, we further develop tailored projective tests to evaluate and analyze the value inclinations of LLMs across different model sizes, training methods, and data sources. Our framework fosters an interdisciplinary paradigm of understanding LLMs, paving the way for future AI alignment and regulation.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The problems that the paper attempts to solve are: Do large language models (LLMs) have values that surpass those of humans? If such values exist, what are they like? How can these values be evaluated? ### Background and motivation In recent years, the development of large language models has greatly promoted the field of artificial intelligence, but it has also brought potential safety and ethical risks at the same time. Understanding the values embedded in these models is crucial for evaluating and mitigating these risks. However, most of the existing research relies on the human value system in social sciences to evaluate the values of LLMs. This raises a natural question: Do LLMs have unique, non - human values? ### Research methods To solve the above - mentioned problems, the paper proposes a new framework - **ValueLex** for constructing the unique value system of LLMs from scratch and evaluating their value tendencies. The specific steps are as follows: 1. **Generative value construction**: - **Value vocabulary hypothesis**: It is hypothesized that the important values of LLMs will exist in the form of single words in their internal parameter spaces. - **Value extraction**: Through designed abductive reasoning and summarization, value - descriptive words are collected from more than 30 LLMs with different settings. - **Factor analysis and semantic clustering**: Using factor analysis and semantic clustering methods, the most representative value - descriptive words are identified, and finally a comprehensive value framework containing three core dimensions (ability, character, integrity) and their sub - dimensions is formed. 2. **Projection test evaluation**: - **Test creation**: Design a series of sentence - completion tests, and let LLMs generate continuations according to these sentences, thereby reflecting their values. - **Result analysis**: Map the generated responses to the quantified value space through a classifier, and calculate the scores of each model on each value dimension. ### Main findings 1. **Ability dimension**: LLMs generally value ability, but there are differences among different models. For example, Mistral and Tulu emphasize ability more, while Baichuan is more inclined towards integrity. 2. **Influence of training methods**: Pretrained models do not have significant value tendencies; instruction - tuning enhances the consistency of each dimension, and alignment further diversifies the values. 3. **Ability expansion**: Larger models prefer ability more, but the values of other dimensions may be ignored. ### Contributions - **First revelation**: Reveal the three core value dimensions of LLMs, their sub - dimensions and structures. - **Evaluation tool**: Develop a special projection test for evaluating the potential value tendencies of LLMs. - **Impact analysis**: Explore the influence of factors such as model size and training methods on the value tendencies of LLMs, and discuss the differences between LLMs and human values. ### Conclusion Although the value system of LLMs has certain similarities with that of humans, the values of LLMs are more specialized, reflecting clear human expectations. This indicates that intentional guidance (such as alignment) can effectively change the underlying value system of AI. However, this system still lacks the dynamic and motivational aspects of human values, pointing out the direction for the continuous improvement of AI values in the future.

Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches

Measuring Human and AI Values based on Generative Psychometrics with Large Language Models

Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Value

Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values

High-Dimension Human Value Representation in Large Language Models

Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?

CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses

Exploring Multilingual Concepts of Human Values in Large Language Models: is Value Alignment Consistent, Transferable and Controllable Across Languages?

Large Language Models as Superpositions of Cultural Perspectives

The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning

CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility

ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models

ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

LocalValueBench: A Collaboratively Built and Extensible Benchmark for Evaluating Localized Value Alignment and Ethical Safety in Large Language Models

Can Language Models Reason about Individualistic Human Values and Preferences?

Assessing LLMs for Moral Value Pluralism

Assessment of Multimodal Large Language Models in Alignment with Human Values

Do LLMs have Consistent Values?

Strong and weak alignment of large language models with human values