Abstract:Continuous POMDPs with general belief-dependent rewards are notoriously difficult to solve online. In this paper, we present a complete provable theory of adaptive multilevel simplification for the setting of a given externally constructed belief tree and MCTS that constructs the belief tree on the fly using an exploration technique. Our theory allows to accelerate POMDP planning with belief-dependent rewards without any sacrifice in the quality of the obtained solution. We rigorously prove each theoretical claim in the proposed unified theory. Using the general theoretical results, we present three algorithms to accelerate continuous POMDP online planning with belief-dependent rewards. Our two algorithms, SITH-BSP and LAZY-SITH-BSP, can be utilized on top of any method that constructs a belief tree externally. The third algorithm, SITH-PFT, is an anytime MCTS method that permits to plug-in any exploration technique. All our methods are guaranteed to return exactly the same optimal action as their unsimplified equivalents. We replace the costly computation of information-theoretic rewards with novel adaptive upper and lower bounds which we derive in this paper, and are of independent interest. We show that they are easy to calculate and can be tightened by the demand of our algorithms. Our approach is general; namely, any bounds that monotonically converge to the reward can be utilized to achieve significant speedup without any loss in performance. Our theory and algorithms support the challenging setting of continuous states, actions, and observations. The beliefs can be parametric or general and represented by weighted particles. We demonstrate in simulation a significant speedup in planning compared to baseline approaches with guaranteed identical performance.

Accelerating Point-Based Pomdp Algorithms Via Greedy Strategies

Hybrid Heuristic Online Planning for POMDPs

Anytime Point-Based Approximations for Large POMDPs

Adaptive Online Packing-guided Search for POMDPs

Point-Based POMDP Algorithms: Improved Analysis and Implementation

A Probabilistic Greedy Search Value Iteration Algorithm For Pomdp

FHHOP: a Factored Hybrid Heuristic Online Planning Algorithm for Large POMDPs

A Partially Observable Monte Carlo Planning Algorithm Based on Path Modification.

A point-reduced POMDP value iteration algorithm with application to robot navigation

Improving Online POMDP Planning Algorithms with Decaying Q Value

Scaling Long-Horizon Online POMDP Planning via Rapid State Space Sampling

No Compromise in Solution Quality: Speeding Up Belief-dependent Continuous POMDPs via Adaptive Multilevel Simplification

PODDP: Partially Observable Differential Dynamic Programming for Latent Belief Space Planning

Multi-Objective Safe-Interval Path Planning With Dynamic Obstacles

Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

A Search Space Utility Optimization Based Online POMDP Planning Algorithm

Online algorithms for POMDPs with continuous state, action, and observation spaces

Research on Multi-Agent Task Allocation and Path Planning Based on Pri-MADDPG

A Scalable Model-Free Recurrent Neural Network Framework for Solving POMDPs

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

A Hybrid Heuristic Value Iteration Algorithm for Pomdp