FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas

Yu Lei,Hao Liu,Chengxing Xie,Songjia Liu,Zhiyu Yin,Canyu Chen,Guohao Li,Philip Torr,Zhen Wu

2024-10-17

Abstract:AI alignment is a pivotal issue concerning AI control and safety. It should consider not only value-neutral human preferences but also moral and ethical considerations. In this study, we introduced FairMindSim, which simulates the moral dilemma through a series of unfair scenarios. We used LLM agents to simulate human behavior, ensuring alignment across various stages. To explore the various socioeconomic motivations, which we refer to as beliefs, that drive both humans and LLM agents as bystanders to intervene in unjust situations involving others, and how these beliefs interact to influence individual behavior, we incorporated knowledge from relevant sociological fields and proposed the Belief-Reward Alignment Behavior Evolution Model (BREM) based on the recursive reward model (RRM). Our findings indicate that, behaviorally, GPT-4o exhibits a stronger sense of social justice, while humans display a richer range of emotions. Additionally, we discussed the potential impact of emotions on behavior. This study provides a theoretical foundation for applications in aligning LLMs with altruistic values.

Computational Engineering, Finance, and Science,Artificial Intelligence

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the consistency between large - language models (LLMs) and humans in terms of behavior, emotion, and belief in moral dilemmas. Specifically, by constructing a simulation system named FairMindSim, researchers aim to explore and compare the behavioral decisions, emotional responses, and underlying belief - driving factors of human and LLM agents when facing unjust situations. The research focuses on the following aspects: 1. **Value - consistency perspective**: It discusses the moral - dilemma problems faced by LLMs from a psychological perspective and provides theoretical support in the interdisciplinary field of AI and psychology. 2. **Simulation of moral dilemmas**: By designing a series of unfair situations to simulate moral dilemmas, it compares the behavioral and emotional differences between human and LLM agents in these situations while following psycho - ethical standards. 3. **Belief - Reward Alignment Behavior Evolution Model (BREM)**: Based on the Recursive Reward Model (RRM) and combined with relevant psychological theories, the BREM model is proposed to explore the relationship between belief evolution and decision - making, compare the belief differences between human and LLM agents, and discuss the influence of emotion. 4. **Experimental results**: The results show that GPT - 4o performs better in terms of fairness and sense of justice, while humans show more complex emotional stability, which may affect decision - making. Through these studies, the paper provides a theoretical basis and technical methods for developing LLMs that are consistent with human values, especially in social interactions that require fairness and justice.

FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas

Moral Alignment for LLM Agents

How is the AI Perceived when It Behaves (Un)fairly?

The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making

AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations

ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation

Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

Combining Theory of Mind and Kindness for Self-Supervised Human-AI Alignment

Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks

GreedLlama: Performance of Financial Value-Aligned Large Language Models in Moral Reasoning

Building Altruistic and Moral AI Agent with Brain-inspired Affective Empathy Mechanisms

Social Contract AI: Aligning AI Assistants with Implicit Group Norms

Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

LLM Theory of Mind and Alignment: Opportunities and Risks

Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games

How Ethical Should AI Be? How AI Alignment Shapes the Risk Preferences of LLMs

Instilling moral value alignment by means of multi-objective reinforcement learning

Cognitive Models as Simulators: The Case of Moral Decision-Making

Do LLM Agents Exhibit Social Behavior?