Abstract:Deep Reinforcement Learning (DRL) has shown outstanding performance on inducing effective action policies that maximize expected long-term return on many complex tasks. Much of DRL work has been focused on sequences of events with discrete time steps and ignores the irregular time intervals between consecutive events. Given that in many real-world domains, data often consists of temporal sequences with irregular time intervals, and it is important to consider the time intervals between temporal events to capture latent progressive patterns of states. In this work, we present a general Time-Aware RL framework: Time-aware Q-Networks (TQN), which takes into account physical time intervals within a deep RL framework. TQN deals with time irregularity from two aspects: 1) elapsed time in the past and an expected next observation time for time-aware state approximation, and 2) action time window for the future for time-aware discounting of rewards. Experimental results show that by capturing the underlying structures in the sequences with time irregularities from both aspects, TQNs significantly outperform DQN in four types of contexts with irregular time intervals. More specifically, our results show that in classic RL tasks such as CartPole and MountainCar and Atari benchmark with randomly segmented time intervals, time-aware discounting alone is more important while in the real-world tasks such as nuclear reactor operation and septic patient treatment with intrinsic time intervals, both time-aware state and time-aware discounting are crucial. Moreover, to improve the agent's learning capacity, we explored three boosting methods: Double networks, Dueling networks, and Prioritized Experience Replay, and our results show that for the two real-world tasks, combining all three boosting methods with TQN is especially effective.

TPN:Triple network algorithm for deep reinforcement learning

Why Target Networks Stabilise Temporal Difference Methods

Simplifying Deep Temporal Difference Learning

Target-Based Temporal Difference Learning

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Deep Reinforcement Learning

A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment

Time-Aware Q-Networks: Resolving Temporal Irregularity for Deep Reinforcement Learning

GTD3-NET: A deep reinforcement learning-based routing optimization algorithm for wireless networks

Fixed-Weight Difference Target Propagation

End-to-End Autonomous Driving Decision-Making Solution Based on Pri-TD3

Triple-Memory Networks: A Brain-Inspired Method for Continual Learning

Deep Task-specific Bottom Representation Network for Multi-Task Recommendation

GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

The Ladder in Chaos: A Simple and Effective Improvement to General DRL Algorithms by Policy Path Trimming and Boosting

A novel consensus PSO-assisted trajectory unified and trust-tech methodology for DNN training and its applications

A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential

Efficient Deep Reinforcement Learning Through Policy Transfer.

PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems