Improving Online POMDP Planning Algorithms with Decaying Q Value

Qingya Wang,Feng Liu,Xuan Wang,Bin Luo
DOI: https://doi.org/10.1109/ictai59109.2023.00102
2023-01-01
Abstract:Online POMDP solvers search for the optimal policy based on multiple simulations. When scaling to large problems, more simulations typically lead to better results, but also more search time, thus it is necessary to make the best of finite simulations. Note that multiple simulations are not equivalent or independent, among which the earlier ones tend to sample randomly, while the later ones can take advantage of the previous results to better balance the exploration and exploitation. Moreover, there may be some possible environmental changes during the planning procedure. For these considerations, we allocate different weights to multiple simulations according to their order and propose a general Decaying Q Value (DQV) method to improve the existing online POMDP planning algorithms. We choose to improve POMCPOW, one of the state-of-the-art algorithms, to verify the effectiveness of the proposed method. Several experiments show that DQV can achieve competitive results on large-scale problems.
What problem does this paper attempt to address?