A Probability-Based Value Iteration on Optimal Policy Algorithm for POMDP

Feng LIU,Chong-jun WANG,Bin LUO
DOI: https://doi.org/10.3969/j.issn.0372-2112.2016.05.010
2016-01-01
Abstract:With the enlargement of the scale of POMDP problems in applications,the research of heuristic methods for reachable area based on the optimal policy becomes current hotspot.However,the standard of existing algorithms about choosing the best action is not perfect enough thus the efficiency of the algorithms is affected.This paper proposes a new value iteration method PBVIOP (Probability-based Value Iteration on Optimal Policy).In depth-first heuristic exploration,this method uses the Monte Carlo algorithm to calculate the probability of each optimal action according to the distribution of each action′s Q function value between its upper and lower bounds,and chooses the maximum probability action.Experiment results of four benchmarks show that PBVIOP algorithm can obtain global optimal solution and significantly improve the convergence efficiency.
What problem does this paper attempt to address?