Abstract:Dexterous robotic manipulation remains a significant challenge due to the high dimensionality and complexity of hand movements required for tasks like in-hand manipulation and object grasping. This paper addresses this issue by introducing Vector Quantized Action Chunking Embedding (VQ-ACE), a novel framework that compresses human hand motion into a quantized latent space, significantly reducing the action space's dimensionality while preserving key motion characteristics. By integrating VQ-ACE with both Model Predictive Control (MPC) and Reinforcement Learning (RL), we enable more efficient exploration and policy learning in dexterous manipulation tasks using a biomimetic robotic hand. Our results show that latent space sampling with MPC produces more human-like behavior in tasks such as Ball Rolling and Object Picking, leading to higher task success rates and reduced control costs. For RL, action chunking accelerates learning and improves exploration, demonstrated through faster convergence in tasks like cube stacking and in-hand cube reorientation. These findings suggest that VQ-ACE offers a scalable and effective solution for robotic manipulation tasks involving complex, high-dimensional state spaces, contributing to more natural and adaptable robotic systems.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the high - dimensionality and complexity in dexterous robot manipulation, especially the high - dimensional and complex hand movements required in hand - manipulation tasks (such as in - hand manipulation and object grasping). Specifically: 1. **High - dimensional action space**: The human hand has 27 degrees of freedom (DoF) and is able to perform a variety of complex movements and postures, making the imitation of the fine operations of the human hand a major challenge in the field of robotics. 2. **Complex hand movements**: In order to complete certain specific tasks (such as sphere rolling, object picking, etc.), it is necessary to precisely control these high - dimensional action sequences, which poses high requirements for existing robot systems. To solve these problems, this paper introduces a new framework named **Vector Quantized Action Chunking Embedding (VQ - ACE)**. VQ - ACE significantly reduces the dimension of the action space by compressing human hand movements into a quantized latent space while retaining key motion features. This enables more efficient policy search and learning when using bionic robot hands in dexterous manipulation tasks. ### Main contributions 1. **Propose the VQ - ACE framework**: It is used to embed human hand - action sequences into quantized latent representations. 2. **Propose model predictive control (MPC) based on latent sampling**: This is a real - time action synthesis algorithm that samples in the latent space. 3. **Propose reinforcement learning (RL) based on action chunks**: It improves the exploration ability of RL through action priors and accelerates the learning process. Through these methods, VQ - ACE has demonstrated higher task success rates and lower control costs in multiple experiments, especially in tasks such as sphere rolling and object picking. In addition, it has also accelerated the convergence of RL and improved the exploration efficiency, and has performed particularly well in tasks such as block stacking and in - hand block re - orientation. In conclusion, VQ - ACE provides a scalable and effective solution for robot manipulation tasks involving complex high - dimensional state spaces, which helps to realize more natural and adaptable robot systems.

VQ-ACE: Efficient Policy Search for Dexterous Robotic Manipulation via Action Chunking Embedding

Learning Robot Manipulation Skills from Human Demonstration Videos Using Two-Stream 2-D/3-D Residual Networks with Self-Attention

Dexterous Manoeuvre Through Touch in a Cluttered Scene

A High-Efficient Reinforcement Learning Approach for Dexterous Manipulation

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation

Composable Deep Reinforcement Learning for Robotic Manipulation

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

Object-Centric Dexterous Manipulation from Human Motion Data

Q-Attention: Enabling Efficient Learning for Vision-Based Robotic Manipulation

Dexterous Manipulation from Images: Autonomous Real-World RL via Substep Guidance

Learning Deep Visuomotor Policies for Dexterous Hand Manipulation

Data-efficient Deep Reinforcement Learning for Dexterous Manipulation

Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost

InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation

Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand

Dexterous In-Hand Manipulation of Slender Cylindrical Objects through Deep Reinforcement Learning with Tactile Sensing

VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation

Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning

Dexterous Imitation Made Easy: A Learning-Based Framework for Efficient Dexterous Manipulation