Popvi: A Probability-Based Optimal Policy Value Iteration Algorithm

Feng Liu,Bin Luo
DOI: https://doi.org/10.1007/978-3-319-13560-1_50
2014-01-01
Abstract:Point-based value iteration methods are a family of effective algorithms for solving POMDP models and their performance mainly depends on the exploration of the search space. Although global optimization can be obtained by algorithms such as HSVI and GapMin, their exploration of the optimal action is overly optimistic which therefore slows down the efficiency. In this paper, we propose a novel heuristic search method POPVI (Probability-based Optimal Policy Value Iteration) which explores the optimal action based on probability. In depth-first heuristic exploration, this algorithm uses a Monte-Carlo method to estimate the probabilities that actions are optimal according to the distribution of actions' Q-value function, applies the action of the maximum probability and greedily explores subsequent belief point of the greatest uncertainty. Experimental results show that POPVI outperforms HSVI, and by a large margin when the scale of the POMDP increases.
What problem does this paper attempt to address?