Abstract:In reinforcement learning, the Markov Decision Process (MDP) framework typically operates under a blocking paradigm, assuming a static environment during the agent's decision-making and stationary agent behavior while the environment executes its actions. This static model often proves inadequate for real-time tasks, as it lacks the flexibility to handle concurrent changes in both the agent's decision-making process and the environment's dynamic responses. Contemporary solutions, such as linear interpolation or state space augmentation, attempt to address the asynchronous nature of delayed states and actions in real-time environments. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. To address these challenges, we introduce a minimal information set that encapsulates concurrent information during agent-environment interactions, serving as the foundation of our real-time decision-making framework. The traditional blocking-mode MDP is then reformulated as a Minimal Information State Markov Decision Process (MISMDP), aligning more closely with the demands of real-time environments. Within this MISMDP framework, we propose the " M inimal information set for R eal-time tasks using A ctor- C ritic" (MRAC), a general approach for addressing delay issues in real-time tasks, supported by a rigorous theoretical analysis of Q-function convergence. Extensive experiments across both discrete and continuous action space environments demonstrate that MRAC outperforms state-of-the-art algorithms, delivering superior performance and generalization in managing delays within real-time tasks.

Overcoming Delayed Feedback Via Overlook Decision Making

Delay-Aware Model-Based Reinforcement Learning for Continuous Control

A delay-robust method for enhanced real-time reinforcement learning

Online Sequential Decision-Making with Unknown Delays

Addressing Delays in Reinforcement Learning Via Delayed Adversarial Imitation Learning

Addressing Signal Delay in Deep Reinforcement Learning.

Reinforcement Learning from Delayed Observations via World Models

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

Acting in Delayed Environments with Non-Stationary Markov Policies

Achieving optimal trade-off for student dropout prediction with multi-objective reinforcement learning

Accelerating Proximal Policy Optimization Learning Using Task Prediction for Solving Environments with Delayed Rewards

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

Overcoming Delayed Feedback in Reinforcement Learning Using Actor Ensembles

Delays in Reinforcement Learning

Analytical Solution to A Discrete-Time Model for Dynamic Learning and Decision-Making

Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes

Budgeted Recommendation with Delayed Feedback

Variational Delayed Policy Optimization

Off-Policy Reinforcement Learning with Delayed Rewards

Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning