Abstract:Diffusion models have been widely employed in the field of 3D manipulation due to their efficient capability to learn distributions, allowing for precise prediction of action trajectories. However, diffusion models typically rely on large parameter UNet backbones as policy networks, which can be challenging to deploy on resource-constrained devices. Recently, the Mamba model has emerged as a promising solution for efficient modeling, offering low computational complexity and strong performance in sequence modeling. In this work, we propose the Mamba Policy, a lighter but stronger policy that reduces the parameter count by over 80% compared to the original policy network while achieving superior performance. Specifically, we introduce the XMamba Block, which effectively integrates input information with conditional features and leverages a combination of Mamba and Attention mechanisms for deep feature extraction. Extensive experiments demonstrate that the Mamba Policy excels on the Adroit, Dexart, and MetaWorld datasets, requiring significantly fewer computational resources. Additionally, we highlight the Mamba Policy's enhanced robustness in long-horizon scenarios compared to baseline methods and explore the performance of various Mamba variants within the Mamba Policy framework. Our project page is in <a class="link-external link-https" href="https://andycao1125.github.io/mamba_policy/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the difficulty of deploying existing 3D diffusion strategy models on computing - resource - constrained devices. Specifically, current 3D diffusion models (such as DP3) rely on the UNet architecture with a large number of parameters as the policy network, which results in significant computational overhead and memory usage, making these models difficult to deploy and operate efficiently in edge devices or resource - constrained environments. In addition, the performance of these models in long - time - domain prediction tasks also needs to be improved. To solve these problems, the paper proposes Mamba Policy, a lighter - weight but more powerful policy model. By introducing the XMamba module, this model combines Mamba's selective state - space model with the attention mechanism, thereby significantly reducing the number of parameters (by more than 80%) while achieving better performance than existing methods on multiple benchmark datasets (such as Adroit, DexArt, and MetaWorld). In addition, Mamba Policy shows stronger robustness in long - time - domain scenarios, further enhancing its application potential in complex tasks. ### Summary of specific problems and solutions: 1. **High consumption of computing resources**: Existing 3D diffusion models rely on the UNet architecture with a large number of parameters, resulting in excessive computational overhead and memory usage. - **Solution**: Propose Mamba Policy. By introducing the XMamba module, combine Mamba's selective state - space model with the attention mechanism, reduce the number of parameters by more than 80%, and significantly reduce the demand for computing resources. 2. **Insufficient performance in long - time - domain prediction**: The performance of existing models in long - time - domain prediction tasks is not ideal. - **Solution**: Mamba Policy enhances the ability to handle long - time - domain dependencies by optimizing the model structure, improving robustness and accuracy in long - time - domain scenarios. ### Experimental verification: - Experimental results on multiple datasets such as Adroit, DexArt, and MetaWorld show that Mamba Policy not only outperforms existing methods in performance but also significantly reduces computing resource consumption. - Ablation experiments further verify the important contributions of each module (such as FiLM fusion, Mamba module, attention mechanism, etc.) to the model performance. In summary, this paper aims to solve the deployment problem of existing 3D diffusion models on computing - resource - constrained devices and improve their performance in long - time - domain prediction tasks by proposing Mamba Policy.

Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models

Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba

Hierarchical Diffusion Policy: manipulation trajectory generation via contact guidance

One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

DNAct: Diffusion Guided Multi-Task 3D Policy Learning

Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion

Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

OccMamba: Semantic Occupancy Prediction with State Space Models

A Survey of Mamba

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration

A Survey on Visual Mamba

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation