A Probabilistic Greedy Search Value Iteration Algorithm For Pomdp

Feng Liu,Zebang Song
DOI: https://doi.org/10.1109/ICTAI.2016.0143
2016-01-01
Abstract:Point-based value iteration methods are a class of effective algorithms for solving POMDP model. Although MDP-based algorithms such as FSVI can reduce the complexity and improve efficiency greatly by using the optimal strategy of the underlying MDP, the excessive randomness of these algorithms makes them not suitable for the realistic POMDP problems. A probabilistic greedy search value iteration algorithm (PGSVI) is presented in the paper. PGSVI selects action according to the weighted reward, probabilistic greedy explores the state for the next horizon based on belief state and the transition function, then samples observation from observations whose observation probability is greater than a threshold. PGSVI makes up the shortage of FSVI algorithm and ensures the efficiency by selecting more rational actions, states and observations during the exploration. Experiment results of four benchmarks show that PGSVI is very competitive with FSVI in POMDP problems with large-scale observations.
What problem does this paper attempt to address?