Policy Graph Pruning And Optimization In Monte Carlo Value Iteration For Continuous-State Pomdps

Weisheng Qian,Quan Liu,Zongzhang Zhang,Zhiyuan Pan,Shan Zhong
DOI: https://doi.org/10.1109/SSCI.2016.7849833
2016-01-01
Abstract:Nowadays, Partially Observation Markov Decision Processes (POMDPs) provide a principled mathematical framework for solving some realistic problems with continuous spaces. The recently introduced Monte Carlo Value Iteration (MCVI) can tackle such problems with continuous state spaces. It uses a policy graph implicitly to represent the value function, instead of using a set of alpha-functions explicitly. However, the size of its graph would grow over time and it doesn't take any measure to optimize the graph. This makes it not applicable for the devices with limited resources such as wearable watches. This paper introduces three novel techniques to prune and optimize the policy graph obtained by MCVI. First, we optimize the internal structure of a policy graph G whenever a new node is added into the policy graph. Second, we evaluate the value of each node in G and prune the nodes dominated by others. Third, we prune the redundant nodes, meaning that they are not reachable from the initial action node in any optimal policy graph. Empirical results show that, on the corridor and musical chairs problems, our pruning and optimization methods are useful for constructing more compact policy graphs with comparable qualities.
What problem does this paper attempt to address?