Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering

Sankalp Arora,Sanjiban Choudhury,Sebastian Scherer
DOI: https://doi.org/10.48550/arXiv.1804.02573
2018-04-08
Abstract:Partially Observable Markov Decision Processes (POMDPs) offer an elegant framework to model sequential decision making in uncertain environments. Solving POMDPs online is an active area of research and given the size of real-world problems approximate solvers are used. Recently, a few approaches have been suggested for solving POMDPs by using MDP solvers in conjunction with imitation learning. MDP based POMDP solvers work well for some cases, while catastrophically failing for others. The main failure point of such solvers is the lack of motivation for MDP solvers to gain information, since under their assumption the environment is either already known as much as it can be or the uncertainty will disappear after the next step. However for solving POMDP problems gaining information can lead to efficient solutions. In this paper we derive a set of conditions where MDP based POMDP solvers are provably sub-optimal. We then use the well-known tiger problem to demonstrate such sub-optimality. We show that multi-resolution, budgeted information gathering cannot be addressed using MDP based POMDP solvers. The contribution of the paper helps identify the properties of a POMDP problem for which the use of MDP based POMDP solvers is inappropriate, enabling better design choices.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations and failures encountered when using approximate solvers based on Markov Decision Processes (MDPs) to solve multi - resolution information - gathering tasks in Partially Observable Markov Decision Processes (POMDPs). Specifically, the paper points out that when there is uncertainty in the environment, MDP solvers, due to their inherent assumptions (i.e., the environmental state is known or the uncertainty will disappear in the next step), cannot effectively take actions to obtain information. This limitation results in the performance of MDP - POMDP solvers being significantly lower than the optimal strategy in certain cases, such as the well - known tiger problem and multi - resolution information - gathering problems. ### Core Problems of the Paper 1. **Limitations of MDP - POMDP Solvers**: - **Lack of Information - Gathering Motivation**: MDP solvers assume that the environmental state is known or the uncertainty will disappear in the next step, so there is no motivation to take actions to obtain information. - **Sub - optimal Behavior**: In some cases, MDP - POMDP solvers will take sub - optimal or even disastrous actions because they cannot identify and execute actions that help reduce uncertainty. 2. **Examples of Specific Problems**: - **Tiger Problem**: In this classic problem, an agent needs to determine the location of a tiger by listening to sounds in order to avoid opening the wrong door. MDP - POMDP solvers will not choose the action of listening to sounds because it does not provide an immediate reward, causing the agent to randomly choose a door with only a 50% chance of success. - **Multi - resolution Information Gathering**: In the task of multi - resolution information gathering by Unmanned Aerial Vehicles (UAVs), MDP - POMDP solvers cannot effectively utilize the ability of UAVs to ascend to obtain low - resolution information because these actions do not provide an immediate reward. ### Main Contributions of the Paper 1. **Conditions and Definitions**: - **Expected Value of Information Value**: Defines the value of information and proves that the expected value of information is always non - negative. - **Information - Gathering Actions**: Defines information - gathering actions, that is, actions that are not in the optimal MDP strategy but can reduce uncertainty through observation. 2. **Theoretical Analysis**: - **Sub - optimal Proof**: Through theoretical proof, if there is an information - gathering action whose information value is greater than its cost, then the MDP - POMDP solver will be sub - optimal. 3. **Experimental Verification**: - **Tiger Problem**: Demonstrates the failure of MDP - POMDP solvers in the tiger problem through detailed examples. - **Multi - resolution Information Gathering**: Further verifies the limitations of MDP - POMDP solvers through a simplified multi - resolution information - gathering problem. ### Conclusion Through theoretical analysis and experimental verification, the paper clarifies the limitations of MDP - POMDP solvers when dealing with POMDP problems with information - gathering requirements. This provides a theoretical basis for designing more suitable solvers, helping researchers and engineers make better design choices when facing similar problems.