Strong Simple Policies for POMDPs

Leonore Winterer,Ralf Wimmer,Bernd Becker,Nils Jansen
DOI: https://doi.org/10.1007/s10009-024-00747-0
2024-06-11
International Journal on Software Tools for Technology Transfer
Abstract:The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that provably adheres to one or more specifications. Yet, the general problem is undecidable, and policies require full (and thus potentially unbounded) traces of execution history. To provide good approximations of such policies, POMDP agents often employ randomization over action choices. We consider the problem of computing simpler policies for POMDPs, and provide several approaches to still ensure their expressiveness. Key aspects are (1) the combination of an arbitrary number of specifications the policies need to adhere to, (2) a restricted form of randomization, and (3) a light-weight preprocessing of the POMDP model to encode memory. We provide a novel encoding as a mixed-integer linear program as baseline to solve the underlying problems. Our experiments demonstrate that the policies we obtain are more robust, smaller, and easier to implement for an engineer than those obtained from state-of-the-art POMDP solvers.
computer science, software engineering
What problem does this paper attempt to address?