A Probabilistic Forward Search Value Iteration Algorithm for POMDP

Feng Liu,Cheng Lei,Hanyi Liu,Chongjun Wang
DOI: https://doi.org/10.1109/ictai.2019.00061
2019-01-01
Abstract:Point-based value iteration methods are a class of practical algorithms for solving the POMDP model. The critical process of these methods is the exploration of the belief point set B. Forward search value iteration(FSVI) can reduce the complexity and improve efficiency significantly by using the optimal strategy of the underlying MDP. However, it does not utilize the observations of the model, making it not so efficient in the large-scale POMDP problems. A probabilistic forward-searching value iteration algorithm (PFSVI) is presented in the paper to make up the shortage of FSVI. During the exploration, PFSVI uses the alias method to sample the action a* based on weighted Q(MDP) function and sample the state based on b and the transition function. Then, PFSVI selects the observation z, which lets the successor point b(a*,z) farthest from B. PFSVI can improve the effect by sampling according to the environment and reaching more vast space than FSVI. Experiment results of four benchmarks show that PFSVI can achieve better global optimal solutions than FSVI and PBVI, especially in large-scale problems.
What problem does this paper attempt to address?