Abstract:In reinforcement learning, a reward function is a priori specified mapping that informs the learning agent how well its current actions and states are performing. From the viewpoint of training, reinforcement learning requires no labeled data and has none of the errors that are induced in supervised learning because responsibility is transferred from the loss function to the reward function. Methods that infer an approximated reward function using observations of demonstrations are termed inverse reinforcement learning or apprenticeship learning. A reward function is generated that reproduces observed behaviors. In previous studies, the reward function is implemented by estimating the maximum likelihood, Bayesian or information theoretic methods. This study proposes an inverse reinforcement learning method that has an approximated reward function as a linear combination of feature expectations, each of which plays a role in a base weak classifier. This approximated reward function is used by the agent to learn a policy, and the resultant behaviors are compared with an expert demonstration. The difference between the behaviors of the agent and those of the expert is measured using defined metrics, and the parameters for the approximated reward function are adjusted using an ensemble fuzzy method that has a boosting classification. After some interleaving iterations, the agent performs similarly to the expert demonstration. A fuzzy method is used to assign credits for the rewards in respect of the most recent decision to the neighboring states. Using the proposed method, the agent approximates the expert behaviors in fewer steps. The results of simulation demonstrate that the proposed method performs well in terms of sampling efficiency.

Policy Iteration Reinforcement Learning Based on Geodesic Gaussian Basis Defined on State-action Graph

Recursive Least Squares Policy Iteration Based on Geodesic Gaussian Basis Function

Parametric Approximation Policy Iteration Algorithm Based on Gaussian Process

Gaussian processes in inverse reinforcement learning

Gradient Q : A Unified Algorithm with Function Approximation for Reinforcement Learning

Integrating symmetry of environment by designing special basis functions for value function approximation in reinforcement learning

Representation Policy Iteration

Approximate policy iteration with unsupervised feature learning based on manifold regularization

Reinforcement learning with automatic basis construction based on isometric feature mapping

Nonparametric approximation generalized policy iteration reinforcement learning algorithm based on states clustering

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

A Clustering-Based Graph Laplacian Framework for Value Function Approximation in Reinforcement Learning

Least Squares Policy Iteration Based on Random Vector Basis

An Ensemble Fuzzy Approach for Inverse Reinforcement Learning

Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

Gaussian Process Policy Optimization

Gaussian-Mixture-Model Q-Functions for Reinforcement Learning by Riemannian Optimization

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Reinforcement Learning for Linear Exponential Quadratic Gaussian Problem

Bellman Gradient Iteration for Inverse Reinforcement Learning.

Actor-Critic Algorithms With Epsilon-Greedy Gaussian Policy In Multidimensional Continuous Action Spaces