Abstract:In reinforcement learning, the Markov Decision Process (MDP) framework typically operates under a blocking paradigm, assuming a static environment during the agent's decision-making and stationary agent behavior while the environment executes its actions. This static model often proves inadequate for real-time tasks, as it lacks the flexibility to handle concurrent changes in both the agent's decision-making process and the environment's dynamic responses. Contemporary solutions, such as linear interpolation or state space augmentation, attempt to address the asynchronous nature of delayed states and actions in real-time environments. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. To address these challenges, we introduce a minimal information set that encapsulates concurrent information during agent-environment interactions, serving as the foundation of our real-time decision-making framework. The traditional blocking-mode MDP is then reformulated as a Minimal Information State Markov Decision Process (MISMDP), aligning more closely with the demands of real-time environments. Within this MISMDP framework, we propose the " M inimal information set for R eal-time tasks using A ctor- C ritic" (MRAC), a general approach for addressing delay issues in real-time tasks, supported by a rigorous theoretical analysis of Q-function convergence. Extensive experiments across both discrete and continuous action space environments demonstrate that MRAC outperforms state-of-the-art algorithms, delivering superior performance and generalization in managing delays within real-time tasks.

Delay-Aware Model-Based Reinforcement Learning for Continuous Control

Overcoming Delayed Feedback Via Overlook Decision Making

A delay-robust method for enhanced real-time reinforcement learning

Training for More Robust and Practical Adaptive Signal Control Models

Reinforcement Learning from Delayed Observations via World Models

Addressing Signal Delay in Deep Reinforcement Learning.

Addressing Delays in Reinforcement Learning Via Delayed Adversarial Imitation Learning

Delays in Reinforcement Learning

Control in Stochastic Environment with Delays: A Model-based Reinforcement Learning Approach

Acting in Delayed Environments with Non-Stationary Markov Policies

Delay-aware Robust Control for Safe Autonomous Driving and Racing

Delay-Aware Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control with Model-based Stability Enhancement

DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays

Multi-Agent Reinforcement Learning with Reward Delays

Thinking While Moving: Deep Reinforcement Learning with Concurrent Control

Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs

Off-Policy Reinforcement Learning with Delayed Rewards

Indecision and Delays Are the Parents of Failure - Taming Them Algorithmically by Synthesizing Delay-Resilient Control.

Distributed Model-Free Sliding-Mode Predictive Control of Discrete-Time Second-Order Nonlinear Multiagent Systems With Delays

Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning

Optimal tracking control of batch processes with time-invariant state delay: Adaptive Q-learning with two-dimensional state and control policy