AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

Jieming Cui,Tengyu Liu,Nian Liu,Yaodong Yang,Yixin Zhu,Siyuan Huang

2024-03-19

Abstract:Traditional approaches in physics-based motion generation, centered around imitation learning and reward shaping, often struggle to adapt to new scenarios. To tackle this limitation, we propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions. Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning. Upon receiving an open-vocabulary textual instruction, AnySkill employs a high-level policy that selects and integrates these atomic actions to maximize the CLIP similarity between the agent's rendered images and the text. An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering. We demonstrate AnySkill's capability to generate realistic and natural motion sequences in response to unseen instructions of varying lengths, marking it the first method capable of open-vocabulary physical skill learning for interactive humanoid agents.

Computer Vision and Pattern Recognition,Robotics

What problem does this paper attempt to address?

The paper aims to address the issue of interactive virtual agents in the context of physical skill learning, specifically how to enable these agents to generate natural and physically plausible action sequences based on open vocabulary (i.e., unseen textual descriptions). Specifically, the paper proposes a new method called AnySkill, which combines low-level controllers with high-level strategies. It acquires a series of basic actions through Generative Adversarial Imitation Learning (GAIL) and uses an image-based reward mechanism to optimize these actions to match given textual instructions. This approach allows virtual agents to perform complex interactive tasks in new scenarios without manually designing reward functions. The paper demonstrates the superior performance of AnySkill in executing various open vocabulary physical skills and proves its superiority over existing methods in both qualitative and quantitative evaluations. Additionally, AnySkill shows the ability to interact with dynamic objects (such as a soccer ball and a door), further validating its application potential in complex environments.

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

Grounding Language for Robotic Manipulation via Skill Library

SFV: Reinforcement Learning of Physical Skills from Videos

Strategy and Skill Learning for Physics-based Table Tennis Animation

Responsive Action Generation By Physically-Based Motion Retrieval And Adaptation

ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters

Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters

C$\cdot$ASE: Learning Conditional Adversarial Skill Embeddings for Physics-based Characters

DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills

Choreographer: Learning and Adapting Skills in Imagination

PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play

Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation

Agentic Skill Discovery

Learning Responsive Humanoid Motion Skills from Graph-Powered Motion Matching

Learning Intuitive Physics and One-Shot Imitation Using State-Action-Prediction Self-Organizing Maps

Visuospatial Skill Learning for Robots

Physics-based Motion Retargeting from Sparse Inputs

SIMS: Simulating Human-Scene Interactions with Real World Script Planning

Skill Generalization with Verbs

Unsupervised Skill Discovery for Robotic Manipulation through Automatic Task Generation

Learning Models For Constraint-Based Motion Parameterization From Interactive Physics-Based Simulation