Abstract:Partially Observable Markov Decision Processes (POMDPs) offer an elegant framework to model sequential decision making in uncertain environments. Solving POMDPs online is an active area of research and given the size of real-world problems approximate solvers are used. Recently, a few approaches have been suggested for solving POMDPs by using MDP solvers in conjunction with imitation learning. MDP based POMDP solvers work well for some cases, while catastrophically failing for others. The main failure point of such solvers is the lack of motivation for MDP solvers to gain information, since under their assumption the environment is either already known as much as it can be or the uncertainty will disappear after the next step. However for solving POMDP problems gaining information can lead to efficient solutions. In this paper we derive a set of conditions where MDP based POMDP solvers are provably sub-optimal. We then use the well-known tiger problem to demonstrate such sub-optimality. We show that multi-resolution, budgeted information gathering cannot be addressed using MDP based POMDP solvers. The contribution of the paper helps identify the properties of a POMDP problem for which the use of MDP based POMDP solvers is inappropriate, enabling better design choices.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations and failures encountered when using approximate solvers based on Markov Decision Processes (MDPs) to solve multi - resolution information - gathering tasks in Partially Observable Markov Decision Processes (POMDPs). Specifically, the paper points out that when there is uncertainty in the environment, MDP solvers, due to their inherent assumptions (i.e., the environmental state is known or the uncertainty will disappear in the next step), cannot effectively take actions to obtain information. This limitation results in the performance of MDP - POMDP solvers being significantly lower than the optimal strategy in certain cases, such as the well - known tiger problem and multi - resolution information - gathering problems. ### Core Problems of the Paper 1. **Limitations of MDP - POMDP Solvers**: - **Lack of Information - Gathering Motivation**: MDP solvers assume that the environmental state is known or the uncertainty will disappear in the next step, so there is no motivation to take actions to obtain information. - **Sub - optimal Behavior**: In some cases, MDP - POMDP solvers will take sub - optimal or even disastrous actions because they cannot identify and execute actions that help reduce uncertainty. 2. **Examples of Specific Problems**: - **Tiger Problem**: In this classic problem, an agent needs to determine the location of a tiger by listening to sounds in order to avoid opening the wrong door. MDP - POMDP solvers will not choose the action of listening to sounds because it does not provide an immediate reward, causing the agent to randomly choose a door with only a 50% chance of success. - **Multi - resolution Information Gathering**: In the task of multi - resolution information gathering by Unmanned Aerial Vehicles (UAVs), MDP - POMDP solvers cannot effectively utilize the ability of UAVs to ascend to obtain low - resolution information because these actions do not provide an immediate reward. ### Main Contributions of the Paper 1. **Conditions and Definitions**: - **Expected Value of Information Value**: Defines the value of information and proves that the expected value of information is always non - negative. - **Information - Gathering Actions**: Defines information - gathering actions, that is, actions that are not in the optimal MDP strategy but can reduce uncertainty through observation. 2. **Theoretical Analysis**: - **Sub - optimal Proof**: Through theoretical proof, if there is an information - gathering action whose information value is greater than its cost, then the MDP - POMDP solver will be sub - optimal. 3. **Experimental Verification**: - **Tiger Problem**: Demonstrates the failure of MDP - POMDP solvers in the tiger problem through detailed examples. - **Multi - resolution Information Gathering**: Further verifies the limitations of MDP - POMDP solvers through a simplified multi - resolution information - gathering problem. ### Conclusion Through theoretical analysis and experimental verification, the paper clarifies the limitations of MDP - POMDP solvers when dealing with POMDP problems with information - gathering requirements. This provides a theoretical basis for designing more suitable solvers, helping researchers and engineers make better design choices when facing similar problems.

Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering

Monte Carlo Sampling Methods for Approximating Interactive POMDPs

Anytime Point-Based Approximations for Large POMDPs

Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

Online POMDP Planning with Anytime Deterministic Guarantees

Online algorithms for POMDPs with continuous state, action, and observation spaces

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight

Control Theory Meets POMDPs: A Hybrid Systems Approach

Optimality Guarantees for Particle Belief Approximation of POMDPs

Recursively-Constrained Partially Observable Markov Decision Processes

What should be observed for optimal reward in POMDPs?

PODDP: Partially Observable Differential Dynamic Programming for Latent Belief Space Planning

Hybrid Heuristic Online Planning for POMDPs

Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning

Prospective Side Information for Latent MDPs

A Framework for Sequential Planning in Multi-Agent Settings

Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs

Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach