Markov Decision Processes with Noisy State Observation

Amirhossein Afsharrad,Sanjay Lall

2023-12-14

Abstract:This paper addresses the challenge of a particular class of noisy state observations in Markov Decision Processes (MDPs), a common issue in various real-world applications. We focus on modeling this uncertainty through a confusion matrix that captures the probabilities of misidentifying the true state. Our primary goal is to estimate the inherent measurement noise, and to this end, we propose two novel algorithmic approaches. The first, the method of second-order repetitive actions, is designed for efficient noise estimation within a finite time window, providing identifiable conditions for system analysis. The second approach comprises a family of Bayesian algorithms, which we thoroughly analyze and compare in terms of performance and limitations. We substantiate our theoretical findings with simulations, demonstrating the effectiveness of our methods in different scenarios, particularly highlighting their behavior in environments with varying stationary distributions. Our work advances the understanding of reinforcement learning in noisy environments, offering robust techniques for more accurate state estimation in MDPs.

Machine Learning,Systems and Control

What problem does this paper attempt to address?

This paper attempts to solve the problem of noisy state observations in Markov Decision Processes (MDPs). Specifically, the author focuses on how to model this uncertainty through the confusion matrix, which captures the probability of misidentifying the true state. The main goal of the paper is to estimate the inherent measurement noise and proposes two new algorithmic approaches for this purpose: 1. **Method of Second - Order Repetitive Actions**: - This method aims to efficiently estimate noise within a limited time window and provides identifiability conditions for system analysis. - By repeatedly selecting a certain action and recording the observed state transitions over a period of time, the state distribution can be stabilized, making noise estimation feasible. 2. **Family of Bayesian Algorithms**: - These algorithms perform noise estimation by maintaining and updating the probability distributions of unknown parameters and the current state. - The paper conducts a detailed analysis and comparison of these Bayesian algorithms and explores their performance and limitations. Through these methods, the paper aims to improve the robustness and accuracy of reinforcement learning in noisy environments, especially in different scenarios, particularly in environments with different stationary distributions. The author verifies the effectiveness of these methods through simulation experiments and shows their performance in different situations.

Markov Decision Processes with Noisy State Observation

Robust Anytime Learning of Markov Decision Processes

Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations

Robust Active Measuring under Model Uncertainty

OCMDP: Observation-Constrained Markov Decision Process

Robust Markov Decision Processes without Model Estimation

Numerical method to solve impulse control problems for partially observed piecewise deterministic Markov processes

Sufficient Markov Decision Processes with Alternating Deep Neural Networks

Markov Decision Processes with Continuous Side Information

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations

Monitored Markov Decision Processes

Intermittently Observable Markov Decision Processes

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes

Online Markov Decision Processes with Non-Oblivious Strategic Adversary

Decision-Making Under Uncertainty: Beyond Probabilities

Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making

Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space