A Scalable Model-Free Recurrent Neural Network Framework for Solving POMDPs

Zhenzhen Liu,Itamar Elhanany
DOI: https://doi.org/10.1109/adprl.2007.368178
2007-04-01
Abstract:This paper presents a framework for obtaining an optimal policy in model-free Partially Observable Markov Decision Problems (POMDPs) using a recurrent neural netwonrk (RNN). A Q-function approximation approach is taken, utilizing a novel RNN architecture with computation and storage requirements that are dramatically reduced when compared to existing schemes. A scalable online training algorithm, derived from the real-time recurrent learning (RTRL) algorithm, is employed. Moreover, stochastic meta-descent (SMD), an adaptive step size scheme for stochastic gradient-descent problems, is utilized as means of incorporating curvature information to accelerate the learning process. We consider case studies of POMDPs where state information is not directly available to the agent. Particularly, we investigate scenarios in which the agent receives indentical observations for multiple states, thereby relying on temporal dependencies captured by the RNN to obtain the optimal policy. Simulation results illustrate the effectiveness of the approach along with substantial improvement in convergence rate when compared to existing schemes.
What problem does this paper attempt to address?