Abstract:There has been substantial growth in research on the robot automation, which aims to make robots capable of directly interacting with the world or human. Robot learning for automation from human demonstration is central to such situation. However, the dependence of demonstration restricts robot to a fixed scenario, without the ability to explore in variant situations to accomplish the same task as in demonstration. Deep reinforcement learning methods may be a good method to make robot learning beyond human demonstration and fulfilling the task in unknown situations. The exploration is the core of such generalization to different environments. While the exploration in reinforcement learning may be ineffective and suffer from the problem of low sample efficiency. In this paper, we present Evolutionary Policy Gradient (EPG) to make robot learn from demonstration and perform goal oriented exploration efficiently. Through goal oriented exploration, our method can generalize robot learned skill to environments with different parameters. Our Evolutionary Policy Gradient combines parameter perturbation with policy gradient method in the framework of Evolutionary Algorithms (EAs) and can fuse the benefits of both, achieving effective and efficient exploration. With demonstration guiding the evolutionary process, robot can accelerate the goal oriented exploration to generalize its capability to variant scenarios. The experiments, carried out in robot control tasks in OpenAI Gym with dense and sparse rewards, show that our EPG is able to provide competitive performance over the original policy gradient methods and EAs. In the manipulator task, our robot can learn to open the door with vision in environments which are different from where the demonstrations are provided.

Domain Adaptation of Visual Policies with a Single Demonstration

Generalize Robot Learning from Demonstration to Variant Scenarios with Evolutionary Policy Gradient

Prediction with Action: Visual Policy Learning via Joint Denoising Process

Learning with Dual Demonstration Domains: Random Domain-Adaptive Meta-Learning

AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

One-Shot Domain-Adaptive Imitation Learning via Progressive Learning

Prompt-based Visual Alignment for Zero-shot Policy Transfer

Domain Adaptation Through Task Distillation

Adapting to Distribution Shift by Visual Domain Prompt Generation

Cross-Modal Domain Adaptation for Reinforcement Learning

VR-Goggles for Robots: Real-to-Sim Domain Adaptation for Visual Control

Domain Adaptive Imitation Learning with Visual Observation

MoVie: Visual Model-Based Policy Adaptation for View Generalization

Cross-Modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning

Adaptability Preserving Domain Decomposition for Stabilizing Sim2Real Reinforcement Learning

Cross-Domain Policy Adaptation via Value-Guided Data Filtering

Adapting Image-based RL Policies via Predicted Rewards

Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation

Task-conditioned adaptation of visual features in multi-task policy learning

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Domain Adaptation In Reinforcement Learning Via Latent Unified State Representation