Survey of apprenticeship learning based on reward function approximating

ZhuoJun Jin,Hui Qian,Shenyi Chen,Miaoliang Zhu
2008-01-01
Abstract:This paper surveys reward function approximating based apprenticeship learning. Both the historical basis and a broad selection of current work are summarized. Two kinds of well-known frameworks, inverse reinforcement learning (IRL) and maximum margin planning (MMP), are discussed under the assumptions of both linear and nonlinear reward function. IRL based learning is an iterative process of approaching ideal reward function using linear combination of basis functions, MMP is a set of gradient-based algorithms for training cost-based planners. Bayesian filter and statistical representation of policy function can be adopted to relax the demand of optimal demonstration. Several areas for further research are also suggested, such as extension in uncertain environment and undetermined policy learning.
What problem does this paper attempt to address?