Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation

Kun Wu,Yichen Zhu,Jinming Li,Junjie Wen,Ning Liu,Zhiyuan Xu,Qinru Qiu,Jian Tang
2024-10-26
Abstract:Learning visuomotor policy for multi-task robotic manipulation has been a long-standing challenge for the robotics community. The difficulty lies in the diversity of action space: typically, a goal can be accomplished in multiple ways, resulting in a multimodal action distribution for a single task. The complexity of action distribution escalates as the number of tasks increases. In this work, we propose \textbf{Discrete Policy}, a robot learning method for training universal agents capable of multi-task manipulation skills. Discrete Policy employs vector quantization to map action sequences into a discrete latent space, facilitating the learning of task-specific codes. These codes are then reconstructed into the action space conditioned on observations and language instruction. We evaluate our method on both simulation and multiple real-world embodiments, including both single-arm and bimanual robot settings. We demonstrate that our proposed Discrete Policy outperforms a well-established Diffusion Policy baseline and many state-of-the-art approaches, including ACT, Octo, and OpenVLA. For example, in a real-world multi-task training setting with five tasks, Discrete Policy achieves an average success rate that is 26\% higher than Diffusion Policy and 15\% higher than OpenVLA. As the number of tasks increases to 12, the performance gap between Discrete Policy and Diffusion Policy widens to 32.5\%, further showcasing the advantages of our approach. Our work empirically demonstrates that learning multi-task policies within the latent space is a vital step toward achieving general-purpose agents.
Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges brought by the complexity and diversity of the action space in multi - task robot manipulation. Specifically, traditional robot systems are usually focused on specific tasks, but in modern dynamic environments, robots need to have the versatility to adapt to various situations. Since the action distributions of different tasks are often multimodal and become more complex and entangled as the number of tasks increases, this makes it difficult to learn and execute multiple tasks. To address this challenge, the author proposes a method named "Discrete Policy", aiming to disentangle the action space in multi - task robot manipulation through discrete - policy learning. Discrete Policy uses vector quantization to map action sequences to a discrete latent space, thereby facilitating the learning of task - specific codes. These codes are then reconstructed into the action space according to observations and language instructions. Through this method, Discrete Policy can handle complex, multimodal action distributions more effectively and perform well in multi - task environments. ### Key questions 1. **Can it be effectively deployed in real - world scenarios?** 2. **Can it be extended to multiple complex tasks?** 3. **Can it effectively distinguish behavioral patterns in different tasks?** ### Method overview Discrete Policy consists of two main parts: 1. **Training phase 1**: Use the Vector - Quantized Variational Auto - Encoder (VQ - VAE) to encode complex actions into a discrete latent space and reconstruct these actions through a decoder. 2. **Training phase 2**: Utilize a conditional diffusion model to generate task - specific latent embeddings to guide the decoder to execute appropriate action patterns. ### Experimental results Experiments show that Discrete Policy significantly outperforms existing strong baseline methods, such as Diffusion Policy and OpenVLA, on multiple tasks. Especially when the number of tasks increases, the performance advantage of Discrete Policy is more obvious. For example, in a real - world multi - task training setting with 5 tasks, the average success rate of Discrete Policy is 26% higher than that of Diffusion Policy and 15% higher than that of OpenVLA. When the number of tasks increases to 12, the performance gap further expands to 32.5%. ### Conclusion Discrete Policy provides an innovative method for learning multi - task robot control strategies and can achieve better disentanglement of feature representations in complex multi - task environments. Through extensive simulation and practical experiments, the superior performance of this method in multi - task settings has been proven.