An Efficient Unified Approach Using Demonstrations for Inverse Reinforcement Learning
Maxwell Hwang,Wei-Cheng Jiang,Yu-Jen Chen,Kao-Shing Hwang,Yi-Chia Tseng
DOI: https://doi.org/10.1109/tcds.2019.2957831
IF: 4.546
2021-09-01
IEEE Transactions on Cognitive and Developmental Systems
Abstract:A reinforcement learning (RF) agent is always equipped with a designed reward function to correct policies for optimal decision making through interactions with an environment. However, it is difficult to design a reward function appropriate for complex RF problems. To solve this difficulty, the inverse RF (IRL) is introduced to provide an efficient way to design a reward function based on input derived from knowledgeable experts. In the IRL, experts provide demonstrations so that the agents can imitate the behaviors accordingly. However, even incorrect demonstrations have merits, some of which are similar to correct ones, so as that the agents with these clues can endeavor to avoid the occurrence of that behavior. This article introduces an IRL method which considers two types of demonstrations, correct and incorrect, in function approximation of a reward function. Given the clues from two opposite demonstrations, agents can iteratively approximate a reward function that can guide them to like expert's correct demonstrations and also, prevent them from making the same mistakes as the expert did. These incorrect demonstrations provide agents with some guidelines to avoid erroneous motions in the initial phase. Two simulated tasks, a labyrinth and robot soccer games are conducted to validate the proposed method. The simulation results show that the proposed method can achieve the objectives of generating an appropriate reward function to accomplish apprentice learning with an efficient learning time in IRL.
robotics,computer science, artificial intelligence,neurosciences