Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models

Jiahang Cao,Qiang Zhang,Jingkai Sun,Jiaxu Wang,Hao Cheng,Yulin Li,Jun Ma,Yecheng Shao,Wen Zhao,Gang Han,Yijie Guo,Renjing Xu
2024-09-11
Abstract:Diffusion models have been widely employed in the field of 3D manipulation due to their efficient capability to learn distributions, allowing for precise prediction of action trajectories. However, diffusion models typically rely on large parameter UNet backbones as policy networks, which can be challenging to deploy on resource-constrained devices. Recently, the Mamba model has emerged as a promising solution for efficient modeling, offering low computational complexity and strong performance in sequence modeling. In this work, we propose the Mamba Policy, a lighter but stronger policy that reduces the parameter count by over 80% compared to the original policy network while achieving superior performance. Specifically, we introduce the XMamba Block, which effectively integrates input information with conditional features and leverages a combination of Mamba and Attention mechanisms for deep feature extraction. Extensive experiments demonstrate that the Mamba Policy excels on the Adroit, Dexart, and MetaWorld datasets, requiring significantly fewer computational resources. Additionally, we highlight the Mamba Policy's enhanced robustness in long-horizon scenarios compared to baseline methods and explore the performance of various Mamba variants within the Mamba Policy framework. Our project page is in <a class="link-external link-https" href="https://andycao1125.github.io/mamba_policy/" rel="external noopener nofollow">this https URL</a>.
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the difficulty of deploying existing 3D diffusion strategy models on computing - resource - constrained devices. Specifically, current 3D diffusion models (such as DP3) rely on the UNet architecture with a large number of parameters as the policy network, which results in significant computational overhead and memory usage, making these models difficult to deploy and operate efficiently in edge devices or resource - constrained environments. In addition, the performance of these models in long - time - domain prediction tasks also needs to be improved. To solve these problems, the paper proposes Mamba Policy, a lighter - weight but more powerful policy model. By introducing the XMamba module, this model combines Mamba's selective state - space model with the attention mechanism, thereby significantly reducing the number of parameters (by more than 80%) while achieving better performance than existing methods on multiple benchmark datasets (such as Adroit, DexArt, and MetaWorld). In addition, Mamba Policy shows stronger robustness in long - time - domain scenarios, further enhancing its application potential in complex tasks. ### Summary of specific problems and solutions: 1. **High consumption of computing resources**: Existing 3D diffusion models rely on the UNet architecture with a large number of parameters, resulting in excessive computational overhead and memory usage. - **Solution**: Propose Mamba Policy. By introducing the XMamba module, combine Mamba's selective state - space model with the attention mechanism, reduce the number of parameters by more than 80%, and significantly reduce the demand for computing resources. 2. **Insufficient performance in long - time - domain prediction**: The performance of existing models in long - time - domain prediction tasks is not ideal. - **Solution**: Mamba Policy enhances the ability to handle long - time - domain dependencies by optimizing the model structure, improving robustness and accuracy in long - time - domain scenarios. ### Experimental verification: - Experimental results on multiple datasets such as Adroit, DexArt, and MetaWorld show that Mamba Policy not only outperforms existing methods in performance but also significantly reduces computing resource consumption. - Ablation experiments further verify the important contributions of each module (such as FiLM fusion, Mamba module, attention mechanism, etc.) to the model performance. In summary, this paper aims to solve the deployment problem of existing 3D diffusion models on computing - resource - constrained devices and improve their performance in long - time - domain prediction tasks by proposing Mamba Policy.