Abstract:Imitation learning, e.g., diffusion policy, has been proven effective in various robotic manipulation tasks. However, extensive demonstrations are required for policy robustness and generalization. To reduce the demonstration reliance, we leverage spatial symmetry and propose ET-SEED, an efficient trajectory-level SE(3) equivariant diffusion model for generating action sequences in complex robot manipulation tasks. Further, previous equivariant diffusion models require the per-step equivariance in the Markov process, making it difficult to learn policy under such strong constraints. We theoretically extend equivariant Markov kernels and simplify the condition of equivariant diffusion process, thereby significantly improving training efficiency for trajectory-level SE(3) equivariant diffusion policy in an end-to-end manner. We evaluate ET-SEED on representative robotic manipulation tasks, involving rigid body, articulated and deformable object. Experiments demonstrate superior data efficiency and manipulation proficiency of our proposed method, as well as its ability to generalize to unseen configurations with only a few demonstrations. Website: <a class="link-external link-https" href="https://et-seed.github.io/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to reduce the dependence on a large amount of demonstration data in robot imitation learning and improve the spatial generalization ability of the model?** Specifically, existing imitation learning methods usually require a large amount of demonstration data to learn robust operation strategies. Especially when the pose of the target object is outside the range of the demonstration distribution, the performance of the model is likely to decline. Although some works attempt to solve these problems through methods such as data augmentation or contrastive learning, these methods usually require task - specific knowledge or additional training, and there is no theoretical guarantee of spatial generalization ability. To solve these problems, the paper proposes **ET - SEED (Efficient Trajectory - Level SE(3) Equivariant Diffusion Policy)**, which is an efficient trajectory - level SE(3) equivariant diffusion model. By utilizing spatial symmetry (especially SE(3) equivariance), ET - SEED aims to generate action sequences in complex robot manipulation tasks. Compared with previous equivariant diffusion models, ET - SEED simplifies the conditions of the equivariant diffusion process, significantly improves the training efficiency, and can achieve better data efficiency, manipulation proficiency, and spatial generalization ability with only a small number of demonstrations. ### Main contributions: 1. **Propose ET - SEED**: An efficient trajectory - level SE(3) equivariant diffusion strategy defined on the SE(3) manifold, which can generate proficient and generalizable operation strategies with only a few demonstrations. 2. **Expand the theory of the equivariant diffusion process**: Derive a new SE(3) equivariant diffusion process, which simplifies modeling and inference. 3. **Extensive experimental verification**: Conducted simulation and real - world experiments in standard robot manipulation tasks, demonstrating its data efficiency, manipulation proficiency, and spatial generalization ability, which are significantly better than the baseline methods. ### Key points for solving the problem: - **Utilize SE(3) equivariance**: By introducing SE(3) equivariance, ET - SEED can better handle object pose changes, thereby improving the spatial generalization ability. - **Simplify the equivariant diffusion process**: Through theoretical analysis, it is proved that only one step of equivariant operation is required in the entire denoising process, which greatly reduces the training difficulty. - **Defined on the SE(3) manifold**: Define the diffusion process on the SE(3) manifold instead of the Euclidean space, making the model more expressive and convergent. Through these improvements, ET - SEED not only improves the data efficiency but also can maintain high performance when facing unseen object poses, and is suitable for various complex robot manipulation tasks.

ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy

EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

Generalize Robot Learning from Demonstration to Variant Scenarios with Evolutionary Policy Gradient

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation

Hierarchical Diffusion Policy: manipulation trajectory generation via contact guidance

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

EquivAct: SIM(3)-Equivariant Visuomotor Policies beyond Rigid Object Manipulation

Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation

Equivariant Descriptor Fields: SE(3)-Equivariant Energy-Based Models for End-to-End Visual Robotic Manipulation Learning

SEIL: Simulation-augmented Equivariant Imitation Learning

One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation

AffordDP: Generalizable Diffusion Policy with Transferable Affordance

Don't Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion

Diffusion Co-Policy for Synergistic Human-Robot Collaborative Tasks

Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies