Abstract:In reinforcement learning, the Markov Decision Process (MDP) framework typically operates under a blocking paradigm, assuming a static environment during the agent's decision-making and stationary agent behavior while the environment executes its actions. This static model often proves inadequate for real-time tasks, as it lacks the flexibility to handle concurrent changes in both the agent's decision-making process and the environment's dynamic responses. Contemporary solutions, such as linear interpolation or state space augmentation, attempt to address the asynchronous nature of delayed states and actions in real-time environments. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. To address these challenges, we introduce a minimal information set that encapsulates concurrent information during agent-environment interactions, serving as the foundation of our real-time decision-making framework. The traditional blocking-mode MDP is then reformulated as a Minimal Information State Markov Decision Process (MISMDP), aligning more closely with the demands of real-time environments. Within this MISMDP framework, we propose the " M inimal information set for R eal-time tasks using A ctor- C ritic" (MRAC), a general approach for addressing delay issues in real-time tasks, supported by a rigorous theoretical analysis of Q-function convergence. Extensive experiments across both discrete and continuous action space environments demonstrate that MRAC outperforms state-of-the-art algorithms, delivering superior performance and generalization in managing delays within real-time tasks.

A delay-robust method for enhanced real-time reinforcement learning

Overcoming Delayed Feedback Via Overlook Decision Making

Delay-Aware Model-Based Reinforcement Learning for Continuous Control

Acting in Delayed Environments with Non-Stationary Markov Policies

DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays

Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning

Multi-User Delay-Constrained Scheduling With Deep Recurrent Reinforcement Learning

Dynamic scheduling of decentralized high-end equipment R&D projects via deep reinforcement learning

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

QoE-based Deep Reinforcement Learning for Resource Allocation in Real Time XR Video Transmission

Asynchronous Fractional Multi-Agent Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing

Addressing Signal Delay in Deep Reinforcement Learning.

Asynchronous Methods for Model-Based Reinforcement Learning

R-MADDPG for Partially Observable Environments and Limited Communication

An immediate-return reinforcement learning for the atypical Markov decision processes

Time-Constrained Robust MDPs

A Multi-Task Approach to Robust Deep Reinforcement Learning for Resource Allocation

Towards a Standardized Reinforcement Learning Framework for AAM Contingency Management

Solving time-delay issues in reinforcement learning via transformers

Efficient Reinforcement Learning with Impaired Observability: Learning to Act with Delayed and Missing State Observations

Multi-agent Robust Time Differential Reinforcement Learning over Communicated Networks