Abstract:Reinforcement Learning (RL) empowers agents to acquire various skills by learning from reward signals. Unfortunately, designing high-quality instance-level rewards often demands significant effort. An emerging alternative, RL with delayed reward, focuses on learning from rewards presented periodically, which can be obtained from human evaluators assessing the agent's performance over sequences of behaviors. However, traditional methods in this domain assume the existence of underlying Markovian rewards and that the observed delayed reward is simply the sum of instance-level rewards, both of which often do not align well with real-world scenarios. In this paper, we introduce the problem of RL from Composite Delayed Reward (RLCoDe), which generalizes traditional RL from delayed rewards by eliminating the strong assumption. We suggest that the delayed reward may arise from a more complex structure reflecting the overall contribution of the sequence. To address this problem, we present a framework for modeling composite delayed rewards, using a weighted sum of non-Markovian components to capture the different contributions of individual steps. Building on this framework, we propose Composite Delayed Reward Transformer (CoDeTr), which incorporates a specialized in-sequence attention mechanism to effectively model these contributions. We conduct experiments on challenging locomotion tasks where the agent receives delayed rewards computed from composite functions of observable step rewards. The experimental results indicate that CoDeTr consistently outperforms baseline methods across evaluated metrics. Additionally, we demonstrate that it effectively identifies the most significant time steps within the sequence and accurately predicts rewards that closely reflect the environment feedback.

Reinforcement Learning from Bagged Reward

Reinforcement Learning from Bagged Reward: A Transformer-based Approach for Instance-Level Reward Redistribution

Harnessing Causality in Reinforcement Learning With Bagged Decision Times

Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach

Learning Long-Term Reward Redistribution via Randomized Return Decomposition

Auxiliary Reward Generation with Transition Distance Representation Learning

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models

Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning

Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes

A Reward Optimization Method Based on Action Subrewards in Hierarchical Reinforcement Learning

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Adapting Image-based RL Policies via Predicted Rewards

Towards Long-delayed Sparsity: Learning a Better Transformer Through Reward Redistribution.

Binary Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning

Pseudo Reward and Action Importance Classification for Sparse Reward Problem.

Reinforcement Learning from Diverse Human Preferences

Addressing Reward Engineering For Deep Reinforcement Learning On Multi-Stage Task

Internally Rewarded Reinforcement Learning

A survey on model-based reinforcement learning

Self Punishment and Reward Backfill for Deep Q-Learning