Abstract:Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outcomes and promote behaviors that are not aligned with the designer's intended goal. Although potential-based reward shaping is often suggested as a remedy, we systematically investigate settings where deploying it often significantly impairs performance. To address these issues, we introduce a new framework that uses a bi-level objective to learn \emph{behavior alignment reward functions}. These functions integrate auxiliary rewards reflecting a designer's heuristics and domain knowledge with the environment's primary rewards. Our approach automatically determines the most effective way to blend these types of feedback, thereby enhancing robustness against heuristic reward misspecification. Remarkably, it can also adapt an agent's policy optimization process to mitigate suboptimalities resulting from limitations and biases inherent in the underlying RL algorithms. We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges. We investigate heuristic auxiliary rewards of varying quality -- some of which are beneficial and others detrimental to the learning process. Our results show that our framework offers a robust and principled way to integrate designer-specified heuristics. It not only addresses key shortcomings of existing approaches but also consistently leads to high-performing solutions, even when given misaligned or poorly-specified auxiliary reward functions.

The Guiding Role of Reward Based on Phased Goal in Reinforcement Learning.

Shaping Reward Learning Approach from Passive Samples

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

Reward Shaping via Meta-Learning

Reward Shaping Based on Optimal-Policy-Free

Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals

Learning to Shape Rewards Using a Game of Two Partners

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

Addressing Reward Engineering For Deep Reinforcement Learning On Multi-Stage Task

Learning Task-Distribution Reward Shaping with Meta-Learning.

Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications

Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired Behavior

Temporal Video-Language Alignment Network for Reward Shaping in Reinforcement Learning

Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning

ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization

Adaptively Shaping Reinforcement Learning Agents Via Human Reward

Pseudo Reward and Action Importance Classification for Sparse Reward Problem.

Behavior Alignment via Reward Function Optimization