A Model-free Biomimetics Algorithm for Deterministic Partially Observable Markov Decision Process

Yide Yu,Yue Liu,Xiaochen Yuan,Dennis Wong,Huijie Li,Yan Ma
2024-12-19
Abstract:Partially Observable Markov Decision Process (POMDP) is a mathematical framework for modeling decision-making under uncertainty, where the agent's observations are incomplete and the underlying system dynamics are probabilistic. Solving the POMDP problem within the model-free paradigm is challenging for agents due to the inherent difficulty in accurately identifying and distinguishing between states and observations. We define such a difficult problem as a DETerministic Partially Observable Markov Decision Process (DET-POMDP) problem, which is a specific setting of POMDP. In this problem, states and observations are in a many-to-one relationship. The state is obscured, and its relationship is less apparent to the agent. This creates obstacles for the agent to infer the state through observations. To effectively address this problem, we convert DET-POMDP into a fully observable MDP using a model-free biomimetics algorithm called BIOMAP. BIOMAP is based on the MDP Graph Automaton framework to distinguish authentic environmental information from fraudulent data. Thus, it enhances the agent's ability to develop stable policies against DET-POMDP. The experimental results highlight the superior capabilities of BIOMAP in maintaining operational effectiveness and environmental reparability in the presence of environmental deceptions when compared with existing POMDP solvers. This research opens up new avenues for the deployment of reliable POMDP-based systems in fields that are particularly susceptible to DET-POMDP problems.
Systems and Control
What problem does this paper attempt to address?
What this paper attempts to solve is how to make effective decisions without an environmental model in the partially observable Markov decision process (POMDP), especially in the deterministic partially observable Markov decision process (DET - POMDP) framework. Specifically, this research mainly focuses on the following problems: 1. **Uncertainty of States and Observations**: In DET - POMDP, there is a many - to - one relationship between states and observations, which makes it difficult for agents to accurately identify states through observations. For example, in a medical diagnosis scenario, different patients may show the same instrument readings, but their actual health conditions are different. 2. **Cognitive Fog Phenomenon**: Due to state overlaps in DET - POMDP, agents may receive conflicting reward signals, resulting in Q - value deviations and affecting the accuracy of decision - making. This phenomenon is called "Cognitive Fog". 3. **Challenges of Model - Free Methods**: Most existing POMDP solvers are model - based methods, but in the real world, it is usually impossible to obtain an accurate environmental model in advance. Therefore, a model - free method is required to solve the DET - POMDP problem. To solve these problems, the paper proposes a bionic algorithm named BIOMAP, which draws on the path integration ability of desert ants. BIOMAP transforms the DET - POMDP problem into a fully observable MDP problem by constructing the MDP - Graph - Automaton framework and uses the shortest - path algorithm to find the optimal strategy. Experimental results show that BIOMAP performs excellently in dealing with environmental deception, can maintain the effectiveness of operations and the repairability of the environment, and is superior to existing POMDP solvers. ### Key Formulas - **DET - POMDP Definition**: - State space \( S \) - Action set \( A_s \) for each state \( s \) - Observation set \( O \) - Initial state subset or distribution \( S_0 \) - Target state subset \( S_g \) - Deterministic state - transition function \( T(s, a) \) - Deterministic observation function \( \Omega(s, a) \) - Reward function \( R(s, a) \) - **Q - value Variance Calculation**: \[ \text{Var}(Q)=\frac{1}{n} \sum_{i = 1}^{n}\left(Q(s_i, a)-\frac{1}{n} \sum_{j = 1}^{n}Q(s_j, a)\right)^2 \] Through these methods and formulas, the paper aims to provide a new and reliable solution to meet the challenges in the DET - POMDP problem.