Solving driving policy for autonomous vehicles via AMDP-Q

Linfeng Xia,Hui Qian,Shenyi Chen,Zhuojun Jin
DOI: https://doi.org/10.13245/j.hust.2011.s2.001
2011-01-01
Abstract:Augmented Markov decision process Q-Learning (AMDP-Q) was proposed, which was inspired by the thought of combination of AMDP, Monte Carlo-partially observable Markov decision process (MC-POMDP) and Q-learning. Firstly a lower-dimensional sufficient statistic was taken to represent the belief state space. In the common situation, a good choice was the tuple of the maximum likelihood state and the entropy of the belief. The new space composed of this tuple was referred to as the augmented state space. Secondly, a set of reference states were used to discrete this space, with Q-learning and Shepard interpolation exploited to obtain the state transition probability and the reward function. Finally, ε-greedy policy was applied to select actins to navigate vehicles. The experimental results show that AMDP-Q converges faster than MC-POMDP.
What problem does this paper attempt to address?