Abstract:In reinforcement learning, a reward function is a priori specified mapping that informs the learning agent how well its current actions and states are performing. From the viewpoint of training, reinforcement learning requires no labeled data and has none of the errors that are induced in supervised learning because responsibility is transferred from the loss function to the reward function. Methods that infer an approximated reward function using observations of demonstrations are termed inverse reinforcement learning or apprenticeship learning. A reward function is generated that reproduces observed behaviors. In previous studies, the reward function is implemented by estimating the maximum likelihood, Bayesian or information theoretic methods. This study proposes an inverse reinforcement learning method that has an approximated reward function as a linear combination of feature expectations, each of which plays a role in a base weak classifier. This approximated reward function is used by the agent to learn a policy, and the resultant behaviors are compared with an expert demonstration. The difference between the behaviors of the agent and those of the expert is measured using defined metrics, and the parameters for the approximated reward function are adjusted using an ensemble fuzzy method that has a boosting classification. After some interleaving iterations, the agent performs similarly to the expert demonstration. A fuzzy method is used to assign credits for the rewards in respect of the most recent decision to the neighboring states. Using the proposed method, the agent approximates the expert behaviors in fewer steps. The results of simulation demonstrate that the proposed method performs well in terms of sampling efficiency.

Action Values Based Reinforcement Learning and Optimized Reward Functions

Time‐in‐action RL

A Novel Policy Based on Action Confidence Limit to Improve Exploration Efficiency in Reinforcement Learning

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

An Advantage-based Optimization Method for Reinforcement Learning in Large Action Space

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

Beyond Exponentially Discounted Sum: Automatic Learning of Return Function

Model predictive control-based value estimation for efficient reinforcement learning

Data Efficient Deep Reinforcement Learning with Action-Ranked Temporal Difference Learning

A Reward Optimization Method Based on Action Subrewards in Hierarchical Reinforcement Learning

An Ensemble Fuzzy Approach for Inverse Reinforcement Learning

A Reinforcement Learning Sampling Optimization Method Based on Training Value

A novel multi-step reinforcement learning method for solving reward hacking

Multi-Agent Reinforcement Learning with Optimal Equivalent Action of Neighborhood

Action Pick-up in Dynamic Action Space Reinforcement Learning

Improved SARSA and DQN algorithms for reinforcement learning

An Emotion-Based Approach to Reinforcement Learning Reward Design

An Approach to Optimize Replay Buffer in Value-Based Reinforcement Learning.

Efficient Average Reward Reinforcement Learning Using Constant Shifting Values.

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning