Robust Action Selection in Partially Observable Markov Decision Processes with Model Uncertainty

Hala Mostafa,Mahmoud El Chamie
DOI: https://doi.org/10.1109/CDC.2018.8619468
2018-12-01
Abstract:Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under state transition uncertainty, and sensing uncertainty of the underlying state. Model uncertainty is an important concern when the models, for which an action policy was optimized, change in time, e.g., degrading sensors that result in a drift in the observation function. Replanning a policy whenever a model drifts (if feasible) is both a time consuming and computationally expensive process. At the other extreme, ignoring the drift and following the original policy can lead to high-risk actions with high costs. We present an efficient approach that post-processes a policy computed using initial models to select actions robust to changes in the observation function. The key idea is to maintain a belief region rather than a belief point about the state of the system, and perform online robust action selection w.r.t. the current belief region. Specifically, we formulate a convex optimization problem to select the action that maximizes the worst case reward function for a convexified belief region. Simulation results demonstrate the ability of our approach to avoid high-risk actions when the system is in uncertain states.
Computer Science
What problem does this paper attempt to address?