Abstract:By properly utilizing the learned environment model, model-based reinforcement learning methods can improve the sample efficiency for decision-making problems. Beyond using the learned environment model to train a policy, the success of MCTS-based methods shows that directly incorporating the learned environment model as a planner to make decisions might be more effective. However, when action space is of high dimension and continuous, directly planning according to the learned model is costly and non-trivial. Because of two challenges: (1) the infinite number of candidate actions and (2) the temporal dependency between actions in different timesteps. To address these challenges, inspired by Differential Dynamic Programming (DDP) in optimal control theory, we design a novel Policy Optimization with Model Planning (POMP) algorithm, which incorporates a carefully designed Deep Differential Dynamic Programming (D3P) planner into the model-based RL framework. In D3P planner, (1) to effectively plan in the continuous action space, we construct a locally quadratic programming problem that uses a gradient-based optimization process to replace search. (2) To take the temporal dependency of actions at different timesteps into account, we leverage the updated and latest actions of previous timesteps (i.e., step $1, \cdots, h-1$) to update the action of the current step (i.e., step $h$), instead of updating all actions simultaneously. We theoretically prove the convergence rate for our D3P planner and analyze the effect of the feedback term. In practice, to effectively apply the neural network based D3P planner in reinforcement learning, we leverage the policy network to initialize the action sequence and keep the action update conservative in the planning process. Experiments demonstrate that POMP consistently improves sample efficiency on widely used continuous control tasks. Our code is released at https://github.com/POMP-D3P/POMP-D3P.

TOMA: Topological Map Abstraction for Reinforcement Learning

Active Neural Topological Mapping for Multi-Agent Exploration

Graph learning-based generation of abstractions for reinforcement learning

TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

TopoNav: Topological Navigation for Efficient Exploration in Sparse Reward Environments

MGRL: Graph neural network based inference in a Markov network with reinforcement learning for visual navigation

Leveraging Topological Maps in Deep Reinforcement Learning for Multi-Object Navigation

TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation

Learning to plan with uncertain topological maps

Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed Environments

Topological Experience Replay

DRL-Tomo: a deep reinforcement learning-based approach to augmented data generation for network tomography

MR-TopoMap: Multi-Robot Exploration Based on Topological Map in Communication Restricted Environment.

Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction

Hierarchical Representations and Explicit Memory: Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks

A transformer-based deep reinforcement learning approach to spatial navigation in a partially observable Morris Water Maze

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Learning Markov State Abstractions for Deep Reinforcement Learning

Making Better Decision by Directly Planning in Continuous Control

MaAST: Map Attention with Semantic Transformersfor Efficient Visual Navigation

Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction