A Neighborhood-Based Value Iteration Algorithm For Pomdp Problems

Feng Liu,Zheng Liu
DOI: https://doi.org/10.1109/ICTAI.2018.00126
2018-01-01
Abstract:The excessive growth of the size of the search space has always been an obstacle to POMDP planning. Approximate approaches based on value functions such as GapMin breadth-first explore belief points only according to the difference between lower and upper bounds of the optimal value function, so the representativeness and effectiveness of the explored point set should be further improved. A Neighborhood-Based Value Iteration (NBVI) based on the distribution information of belief points is presented in this paper. NBVI uses the neighborhood relationship between the explored points to divide the explored point set into domains. The average gap of the belief points in the domain is considered as the exploration value of the domain, thereby determining the distribution of the points which is taken into consideration for further exploration. When exploring successor points, NBVI filters the successor point according to both the gap of the successor points and the distance from the successor points to domains. Experimental results show that NBVI is very competitive with GapMin in terms of both solution quality and convergence efficiency on large-scale problems.
What problem does this paper attempt to address?