Towards Socially and Morally Aware RL agent: Reward Design With LLM

Zhaoyue Wang

2024-05-31

Abstract:When we design and deploy an Reinforcement Learning (RL) agent, reward functions motivates agents to achieve an objective. An incorrect or incomplete specification of the objective can result in behavior that does not align with human values - failing to adhere with social and moral norms that are ambiguous and context dependent, and cause undesired outcomes such as negative side effects and exploration that is unsafe. Previous work have manually defined reward functions to avoid negative side effects, use human oversight for safe exploration, or use foundation models as planning tools. This work studies the ability of leveraging Large Language Models (LLM)' understanding of morality and social norms on safe exploration augmented RL methods. This work evaluates language model's result against human feedbacks and demonstrates language model's capability as direct reward signals.

Artificial Intelligence

What problem does this paper attempt to address?

This paper attempts to address the problem of how to align the behavior of reinforcement learning (RL) agents with human values, particularly avoiding actions that violate social and moral norms, during the design and deployment of RL agents. Specifically, the paper focuses on the following points: 1. **Design of the reward function**: Traditional methods of manually defining reward functions may not fully cover all scenarios, leading to behaviors that do not align with human values when agents pursue their goals, such as negative side effects or unsafe exploration behaviors. 2. **Adherence to social and moral norms**: Social and moral norms are often ambiguous and context-dependent, making it very difficult to ensure that agent behavior conforms to these norms through traditional methods. 3. **Safe exploration**: During the exploration process, agents may take actions that are harmful to the environment or themselves, especially during trial-and-error learning. Ensuring the safety of the exploration process is an important issue. To address the above challenges, the paper proposes a new approach that leverages the ability of large language models (LLMs) to understand social and moral norms to enhance RL methods. The aim is to enable agents to explore the environment more safely and avoid negative side effects. Experiments have validated the effectiveness of using language models as direct reward signals and their ability to guide agents to take actions that better align with social and moral norms.

Towards Socially and Morally Aware RL agent: Reward Design With LLM

Reward Design with Language Models

Human-centric Reward Optimization for Reinforcement Learning-based Automated Driving using Large Language Models

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Moral Alignment for LLM Agents

From Laws to Motivation: Guiding Exploration through Law-Based Reasoning and Rewards

Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning

On Designing Effective RL Reward at Training Time for LLM Reasoning

Generating and Evolving Reward Functions for Highway Driving with Large Language Models

Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning

Rule Based Rewards for Language Model Safety

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

REvolve: Reward Evolution with Large Language Models using Human Feedback

Secrets of RLHF in Large Language Models Part II: Reward Modeling

Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs

World Models with Hints of Large Language Models for Goal Achieving

Words as Beacons: Guiding RL Agents with High-Level Language Prompts

Reinforcement Learning Enhanced LLMs: A Survey