Abstract:Sim-to-real transfer is a powerful paradigm for robotic reinforcement learning. The ability to train policies in simulation enables safe exploration and large-scale data collection quickly at low cost. However, prior works in sim-to-real transfer of robotic policies typically do not involve any human-robot interaction because accurately simulating human behavior is an open problem. In this work, our goal is to leverage the power of simulation to train robotic policies that are proficient at interacting with humans upon deployment. But there is a chicken and egg problem -- how to gather examples of a human interacting with a physical robot so as to model human behavior in simulation without already having a robot that is able to interact with a human? Our proposed method, Iterative-Sim-to-Real (i-S2R), attempts to address this. i-S2R bootstraps from a simple model of human behavior and alternates between training in simulation and deploying in the real world. In each iteration, both the human behavior model and the policy are refined. For all training we apply a new evolutionary search algorithm called Blackbox Gradient Sensing (BGS). We evaluate our method on a real world robotic table tennis setting, where the objective for the robot is to play cooperatively with a human player for as long as possible. Table tennis is a high-speed, dynamic task that requires the two players to react quickly to each other's moves, making for a challenging test bed for research on human-robot interaction. We present results on an industrial robotic arm that is able to cooperatively play table tennis with human players, achieving rallies of 22 successive hits on average and 150 at best. Further, for 80% of players, rally lengths are 70% to 175% longer compared to the sim-to-real plus fine-tuning (S2R+FT) baseline. For videos of our system in action, please see https://sites.google.com/view/is2r.

Enhancing Two-Player Performance Through Single-Player Knowledge Transfer: An Empirical Study on Atari 2600 Games

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

Probing Transfer in Deep Reinforcement Learning without Task Engineering

Learning from Human Reward Benefits from Socio-Competitive Feedback

Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay

Asynchronous Advantage Actor-Critic Agent for Starcraft II

Strategy Selection In Complex Game Environments Based On Transfer Reinforcement Learning

Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals

Learning to Play Two-Player Perfect-Information Games without Knowledge

Playing Atari Ball Games with Hierarchical Reinforcement Learning

Pixel to policy: DQN Encoders for within & cross-game reinforcement learning

High Performance Across Two Atari Paddle Games Using the Same Perceptual Control Architecture Without Training

A Neuroevolution Approach to General Atari Game Playing

Social Interaction for Efficient Agent Learning from Human Reward.

PLAYER CO-MODELLING IN A STRATEGY BOARD GAME: DISCOVERING HOW TO PLAY FAST

i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops

Learning on a Budget via Teacher Imitation

Playing SNES in the Retro Learning Environment

Evaluation and Learning in Two-Player Symmetric Games via Best and Better Responses

AGB stars in extragalactic systems

Hybrid Training Strategies: Improving Performance of Temporal Difference Learning in Board Games