Abstract:In practice, reinforcement learning (RL) agents are often trained with a possibly imperfect proxy reward function, which may lead to a human-agent alignment issue (i.e., the learned policy either converges to non-optimal performance with low cumulative rewards, or achieves high cumulative rewards but in undesired manner). To tackle this issue, we consider a framework where a human labeler can provide additional feedback in the form of corrective actions, which expresses the labeler's action preferences although this feedback may possibly be imperfect as well. In this setting, to obtain a better-aligned policy guided by both learning signals, we propose a novel value-based deep RL algorithm called Iterative learning from Corrective actions and Proxy rewards (ICoPro), which cycles through three phases: (1) Solicit sparse corrective actions from a human labeler on the agent's demonstrated trajectories; (2) Incorporate these corrective actions into the Q-function using a margin loss to enforce adherence to labeler's preferences; (3) Train the agent with standard RL losses regularized with a margin loss to learn from proxy rewards and propagate the Q-values learned from human feedback. Moreover, another novel design in our approach is to integrate pseudo-labels from the target Q-network to reduce human labor and further stabilize training. We experimentally validate our proposition on a variety of tasks (Atari games and autonomous driving on highway). On the one hand, using proxy rewards with different levels of imperfection, our method can better align with human preferences and is more sample-efficient than baseline methods. On the other hand, facing corrective actions with different types of imperfection, our method can overcome the non-optimality of this feedback thanks to the guidance from proxy reward.

Reconstructing Actions To Explain Deep Reinforcement Learning

Time‐in‐action RL

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

Experiential Explanations for Reinforcement Learning

How Do You Act? An Empirical Study to Understand Behavior of Deep Reinforcement Learning Agents

Explaining Reinforcement Learning to Mere Mortals: An Empirical Study

A Closer Look at Reward Decomposition for High-Level Robotic Explanations

Verbal Explanations for Deep Reinforcement Learning Neural Networks with Attention on Extracted Features.

Explaining Deep Q-Learning Experience Replay with SHapley Additive exPlanations

Why the Agent Made that Decision: Explaining Deep Reinforcement Learning with Vision Masks

Explainable Reinforcement Learning via Model Transforms

Causal explanation for reinforcement learning: quantifying state and temporal importance

BET: Explaining Deep Reinforcement Learning through The Error-Prone Decisions

Explaining RL Decisions with Trajectories

Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards

Counterfactual State Explanations for Reinforcement Learning Agents via Generative Deep Learning

Transparency and Explanation in Deep Reinforcement Learning Neural Networks

Deep Reinforcement Learning With Macro-Actions

On Improving Deep Reinforcement Learning for POMDPs

Architecting and Visualizing Deep Reinforcement Learning Models