Abstract:There has been substantial growth in research on the robot automation, which aims to make robots capable of directly interacting with the world or human. Robot learning for automation from human demonstration is central to such situation. However, the dependence of demonstration restricts robot to a fixed scenario, without the ability to explore in variant situations to accomplish the same task as in demonstration. Deep reinforcement learning methods may be a good method to make robot learning beyond human demonstration and fulfilling the task in unknown situations. The exploration is the core of such generalization to different environments. While the exploration in reinforcement learning may be ineffective and suffer from the problem of low sample efficiency. In this paper, we present Evolutionary Policy Gradient (EPG) to make robot learn from demonstration and perform goal oriented exploration efficiently. Through goal oriented exploration, our method can generalize robot learned skill to environments with different parameters. Our Evolutionary Policy Gradient combines parameter perturbation with policy gradient method in the framework of Evolutionary Algorithms (EAs) and can fuse the benefits of both, achieving effective and efficient exploration. With demonstration guiding the evolutionary process, robot can accelerate the goal oriented exploration to generalize its capability to variant scenarios. The experiments, carried out in robot control tasks in OpenAI Gym with dense and sparse rewards, show that our EPG is able to provide competitive performance over the original policy gradient methods and EAs. In the manipulator task, our robot can learn to open the door with vision in environments which are different from where the demonstrations are provided.

Synthesizing Programmatic Policy for Generalization Within Task Domain

Generalize Robot Learning from Demonstration to Variant Scenarios with Evolutionary Policy Gradient

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

Leveraging the Efficiency of Multi-Task Robot Manipulation Via Task-Evoked Planner and Reinforcement Learning

Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning.

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Multi-Task Policy Search

Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning

Improving Policy Optimization with Generalist-Specialist Learning

Learning Invariable Semantical Representation from Language for Extensible Policy Generalization

Zero-shot policy generation in lifelong reinforcement learning

Generalization of Compositional Tasks with Logical Specification via Implicit Planning

Generative Dialog Policy for Task-oriented Dialog Systems

Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis

Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Learning Universal Policies via Text-Guided Video Generation

Continual Task Allocation in Meta-Policy Network via Sparse Prompting

Generalization in Text-based Games via Hierarchical Reinforcement Learning

Prototypical context-aware dynamics generalization for high-dimensional model-based reinforcement learning

Policy Stitching: Learning Transferable Robot Policies

MER: Modular Element Randomization for Robust Generalizable Policy in Deep Reinforcement Learning