nso-HSVI: A Not-So-Optimistic Heuristic Search Value Iteration Algorithm for POMDPs

Feng Liu,Haibo Li,Chongjun Wang
DOI: https://doi.org/10.1109/ICTAI.2014.108
2014-01-01
Abstract:Point-based value iteration methods improve computational efficiency by reducing the search space size. Although global optimization can be obtained by algorithms such as HSVI and GapMin, their exploration of the optimal action is overly optimistic which therefore slows down the efficiency. In this paper, we propose a novel heuristic search method nso-HSVI (not-so-optimistic Heuristic Search Value Iteration) which uses a Monte-Carlo method to estimate the probabilities that actions are optimal according to the distribution of actions' Q-value function and applies the action of the maximum probability. Experimental results show that nso-HSVI outperforms HSVI, and by a large margin when the scale of the POMDP increases.
What problem does this paper attempt to address?