Towards Socially and Morally Aware RL agent: Reward Design With LLM

Zhaoyue Wang
2024-05-31
Abstract:When we design and deploy an Reinforcement Learning (RL) agent, reward functions motivates agents to achieve an objective. An incorrect or incomplete specification of the objective can result in behavior that does not align with human values - failing to adhere with social and moral norms that are ambiguous and context dependent, and cause undesired outcomes such as negative side effects and exploration that is unsafe. Previous work have manually defined reward functions to avoid negative side effects, use human oversight for safe exploration, or use foundation models as planning tools. This work studies the ability of leveraging Large Language Models (LLM)' understanding of morality and social norms on safe exploration augmented RL methods. This work evaluates language model's result against human feedbacks and demonstrates language model's capability as direct reward signals.
Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to address the problem of how to align the behavior of reinforcement learning (RL) agents with human values, particularly avoiding actions that violate social and moral norms, during the design and deployment of RL agents. Specifically, the paper focuses on the following points: 1. **Design of the reward function**: Traditional methods of manually defining reward functions may not fully cover all scenarios, leading to behaviors that do not align with human values when agents pursue their goals, such as negative side effects or unsafe exploration behaviors. 2. **Adherence to social and moral norms**: Social and moral norms are often ambiguous and context-dependent, making it very difficult to ensure that agent behavior conforms to these norms through traditional methods. 3. **Safe exploration**: During the exploration process, agents may take actions that are harmful to the environment or themselves, especially during trial-and-error learning. Ensuring the safety of the exploration process is an important issue. To address the above challenges, the paper proposes a new approach that leverages the ability of large language models (LLMs) to understand social and moral norms to enhance RL methods. The aim is to enable agents to explore the environment more safely and avoid negative side effects. Experiments have validated the effectiveness of using language models as direct reward signals and their ability to guide agents to take actions that better align with social and moral norms.