Abstract:Generalizable articulated object manipulation is essential for home-assistant robots. Recent efforts focus on imitation learning from demonstrations or reinforcement learning in simulation, however, due to the prohibitive costs of real-world data collection and precise object simulation, it still remains challenging for these works to achieve broad adaptability across diverse articulated objects. Recently, many works have tried to utilize the strong in-context learning ability of Large Language Models (LLMs) to achieve generalizable robotic manipulation, but most of these researches focus on high-level task planning, sidelining low-level robotic control. In this work, building on the idea that the kinematic structure of the object determines how we can manipulate it, we propose a kinematic-aware prompting framework that prompts LLMs with kinematic knowledge of objects to generate low-level motion trajectory waypoints, supporting various object manipulation. To effectively prompt LLMs with the kinematic structure of different objects, we design a unified kinematic knowledge parser, which represents various articulated objects as a unified textual description containing kinematic joints and contact location. Building upon this unified description, a kinematic-aware planner model is proposed to generate precise 3D manipulation waypoints via a designed kinematic-aware chain-of-thoughts prompting method. Our evaluation spanned 48 instances across 16 distinct categories, revealing that our framework not only outperforms traditional methods on 8 seen categories but also shows a powerful zero-shot capability for 8 unseen articulated object categories. Moreover, the real-world experiments on 7 different object categories prove our framework's adaptability in practical scenarios. Code is released at <a class="link-external link-https" href="https://github.com/GeWu-Lab/LLM_articulated_object_manipulation/tree/main" rel="external noopener nofollow">this https URL</a>.

GenCHiP: Generating Robot Policy Code for High-Precision and Contact-Rich Manipulation Tasks

Code as Policies: Language Model Programs for Embodied Control

Meta-Policy Learning over Plan Ensembles for Robust Articulated Object Manipulation

Spatial-Language Attention Policies for Efficient Robot Learning

Local Policies Enable Zero-shot Long-horizon Manipulation

Robust and High-Precision End-to-End Control Policy for Multi-stage Manipulation Task with Behavioral Cloning.

Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs

Constraint-aware Policy for Compliant Manipulation

Language to Rewards for Robotic Skill Synthesis

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

Leveraging Commonsense Knowledge from Large Language Models for Task and Motion Planning

Automatic Behavior Tree Expansion with LLMs for Robotic Manipulation

Empowering Large Language Models on Robotic Manipulation with Affordance Prompting

Task and Motion Planning with Large Language Models for Object Rearrangement

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation

A Hierarchical Compliance-Based Contextual Policy Search for Robotic Manipulation Tasks With Multiple Objectives

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning

DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies