Abstract:Imitation learning, e.g., diffusion policy, has been proven effective in various robotic manipulation tasks. However, extensive demonstrations are required for policy robustness and generalization. To reduce the demonstration reliance, we leverage spatial symmetry and propose ET-SEED, an efficient trajectory-level SE(3) equivariant diffusion model for generating action sequences in complex robot manipulation tasks. Further, previous equivariant diffusion models require the per-step equivariance in the Markov process, making it difficult to learn policy under such strong constraints. We theoretically extend equivariant Markov kernels and simplify the condition of equivariant diffusion process, thereby significantly improving training efficiency for trajectory-level SE(3) equivariant diffusion policy in an end-to-end manner. We evaluate ET-SEED on representative robotic manipulation tasks, involving rigid body, articulated and deformable object. Experiments demonstrate superior data efficiency and manipulation proficiency of our proposed method, as well as its ability to generalize to unseen configurations with only a few demonstrations. Website: <a class="link-external link-https" href="https://et-seed.github.io/" rel="external noopener nofollow">this https URL</a>
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to reduce the dependence on a large amount of demonstration data in robot imitation learning and improve the spatial generalization ability of the model?**
Specifically, existing imitation learning methods usually require a large amount of demonstration data to learn robust operation strategies. Especially when the pose of the target object is outside the range of the demonstration distribution, the performance of the model is likely to decline. Although some works attempt to solve these problems through methods such as data augmentation or contrastive learning, these methods usually require task - specific knowledge or additional training, and there is no theoretical guarantee of spatial generalization ability.
To solve these problems, the paper proposes **ET - SEED (Efficient Trajectory - Level SE(3) Equivariant Diffusion Policy)**, which is an efficient trajectory - level SE(3) equivariant diffusion model. By utilizing spatial symmetry (especially SE(3) equivariance), ET - SEED aims to generate action sequences in complex robot manipulation tasks. Compared with previous equivariant diffusion models, ET - SEED simplifies the conditions of the equivariant diffusion process, significantly improves the training efficiency, and can achieve better data efficiency, manipulation proficiency, and spatial generalization ability with only a small number of demonstrations.
### Main contributions:
1. **Propose ET - SEED**: An efficient trajectory - level SE(3) equivariant diffusion strategy defined on the SE(3) manifold, which can generate proficient and generalizable operation strategies with only a few demonstrations.
2. **Expand the theory of the equivariant diffusion process**: Derive a new SE(3) equivariant diffusion process, which simplifies modeling and inference.
3. **Extensive experimental verification**: Conducted simulation and real - world experiments in standard robot manipulation tasks, demonstrating its data efficiency, manipulation proficiency, and spatial generalization ability, which are significantly better than the baseline methods.
### Key points for solving the problem:
- **Utilize SE(3) equivariance**: By introducing SE(3) equivariance, ET - SEED can better handle object pose changes, thereby improving the spatial generalization ability.
- **Simplify the equivariant diffusion process**: Through theoretical analysis, it is proved that only one step of equivariant operation is required in the entire denoising process, which greatly reduces the training difficulty.
- **Defined on the SE(3) manifold**: Define the diffusion process on the SE(3) manifold instead of the Euclidean space, making the model more expressive and convergent.
Through these improvements, ET - SEED not only improves the data efficiency but also can maintain high performance when facing unseen object poses, and is suitable for various complex robot manipulation tasks.