Abstract:Partially Observable Markov Decision Process (POMDP) is a mathematical framework for modeling decision-making under uncertainty, where the agent's observations are incomplete and the underlying system dynamics are probabilistic. Solving the POMDP problem within the model-free paradigm is challenging for agents due to the inherent difficulty in accurately identifying and distinguishing between states and observations. We define such a difficult problem as a DETerministic Partially Observable Markov Decision Process (DET-POMDP) problem, which is a specific setting of POMDP. In this problem, states and observations are in a many-to-one relationship. The state is obscured, and its relationship is less apparent to the agent. This creates obstacles for the agent to infer the state through observations. To effectively address this problem, we convert DET-POMDP into a fully observable MDP using a model-free biomimetics algorithm called BIOMAP. BIOMAP is based on the MDP Graph Automaton framework to distinguish authentic environmental information from fraudulent data. Thus, it enhances the agent's ability to develop stable policies against DET-POMDP. The experimental results highlight the superior capabilities of BIOMAP in maintaining operational effectiveness and environmental reparability in the presence of environmental deceptions when compared with existing POMDP solvers. This research opens up new avenues for the deployment of reliable POMDP-based systems in fields that are particularly susceptible to DET-POMDP problems.

What problem does this paper attempt to address?

What this paper attempts to solve is how to make effective decisions without an environmental model in the partially observable Markov decision process (POMDP), especially in the deterministic partially observable Markov decision process (DET - POMDP) framework. Specifically, this research mainly focuses on the following problems: 1. **Uncertainty of States and Observations**: In DET - POMDP, there is a many - to - one relationship between states and observations, which makes it difficult for agents to accurately identify states through observations. For example, in a medical diagnosis scenario, different patients may show the same instrument readings, but their actual health conditions are different. 2. **Cognitive Fog Phenomenon**: Due to state overlaps in DET - POMDP, agents may receive conflicting reward signals, resulting in Q - value deviations and affecting the accuracy of decision - making. This phenomenon is called "Cognitive Fog". 3. **Challenges of Model - Free Methods**: Most existing POMDP solvers are model - based methods, but in the real world, it is usually impossible to obtain an accurate environmental model in advance. Therefore, a model - free method is required to solve the DET - POMDP problem. To solve these problems, the paper proposes a bionic algorithm named BIOMAP, which draws on the path integration ability of desert ants. BIOMAP transforms the DET - POMDP problem into a fully observable MDP problem by constructing the MDP - Graph - Automaton framework and uses the shortest - path algorithm to find the optimal strategy. Experimental results show that BIOMAP performs excellently in dealing with environmental deception, can maintain the effectiveness of operations and the repairability of the environment, and is superior to existing POMDP solvers. ### Key Formulas - **DET - POMDP Definition**: - State space \( S \) - Action set \( A_s \) for each state \( s \) - Observation set \( O \) - Initial state subset or distribution \( S_0 \) - Target state subset \( S_g \) - Deterministic state - transition function \( T(s, a) \) - Deterministic observation function \( \Omega(s, a) \) - Reward function \( R(s, a) \) - **Q - value Variance Calculation**: \[ \text{Var}(Q)=\frac{1}{n} \sum_{i = 1}^{n}\left(Q(s_i, a)-\frac{1}{n} \sum_{j = 1}^{n}Q(s_j, a)\right)^2 \] Through these methods and formulas, the paper aims to provide a new and reliable solution to meet the challenges in the DET - POMDP problem.

A Model-free Biomimetics Algorithm for Deterministic Partially Observable Markov Decision Process

Model-Based Robot Learning Control with Uncertainty Directed Exploration

Partially Observable Markov Decision Processes in Robotics: A Survey

OCMDP: Observation-Constrained Markov Decision Process

Recursively-Constrained Partially Observable Markov Decision Processes

Cost-Bounded Active Classification Using Partially Observable Markov Decision Processes

A partially observable multi-ship collision avoidance decision-making model based on deep reinforcement learning

Control Theory Meets POMDPs: A Hybrid Systems Approach

Modeling and Control Architecture for the Competitive Networked Robot System Based on POMDP

Explainable Finite-Memory Policies for Partially Observable Markov Decision Processes

Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering

Optimality Guarantees for Particle Belief Approximation of POMDPs

A Novel Deep Reinforcement Learning for POMDP-based Autonomous Ship Collision Decision-Making

Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes

ODE-based Recurrent Model-free Reinforcement Learning for POMDPs

Situation-aware decision making for autonomous driving on urban road using online POMDP

Bridging POMDPs and Bayesian decision making for robust maintenance planning under model uncertainty: An application to railway systems

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach

Model-based motion planning in POMDPs with temporal logic specifications

PODDP: Partially Observable Differential Dynamic Programming for Latent Belief Space Planning

Model-Based Opponent Modeling