GRAPE: Generalizing Robot Policy via Preference Alignment

Zijian Zhang,Kaiyuan Zheng,Zhaorun Chen,Joel Jang,Yi Li,Chaoqi Wang,Mingyu Ding,Dieter Fox,Huaxiu Yao
2024-11-29
Abstract:Despite the recent advancements of vision-language-action (VLA) models on a variety of robotics tasks, they suffer from critical issues such as poor generalizability to unseen tasks, due to their reliance on behavior cloning exclusively from successful rollouts. Furthermore, they are typically fine-tuned to replicate demonstrations collected by experts under different settings, thus introducing distribution bias and limiting their adaptability to diverse manipulation objectives, such as efficiency, safety, and task completion. To bridge this gap, we introduce GRAPE: Generalizing Robot Policy via Preference Alignment. Specifically, GRAPE aligns VLAs on a trajectory level and implicitly models reward from both successful and failure trials to boost generalizability to diverse tasks. Moreover, GRAPE breaks down complex manipulation tasks to independent stages and automatically guides preference modeling through customized spatiotemporal constraints with keypoints proposed by a large vision-language model. Notably, these constraints are flexible and can be customized to align the model with varying objectives, such as safety, efficiency, or task success. We evaluate GRAPE across a diverse array of tasks in both real-world and simulated environments. Experimental results demonstrate that GRAPE enhances the performance of state-of-the-art VLA models, increasing success rates on in-domain and unseen manipulation tasks by 51.79% and 60.36%, respectively. Additionally, GRAPE can be aligned with various objectives, such as safety and efficiency, reducing collision rates by 44.31% and rollout step-length by 11.15%, respectively. All code, models, and data are available at <a class="link-external link-https" href="https://grape-vla.github.io/" rel="external noopener nofollow">this https URL</a>
Robotics,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the insufficient generalization ability of current Vision - Language - Action (VLA) models in robotic tasks. Specifically, although existing VLA models perform well in a variety of robotic tasks, they show poor generalization ability when facing unseen tasks, environments, objects or semantic contexts. This is mainly because these models mainly rely on behavior cloning, that is, learning by imitating successful behavior trajectories, without developing an overall understanding of task goals or an awareness of potential failure modes. In addition, these models usually need to be fine - tuned with demonstration data collected by experts in different settings, thus introducing distribution bias and limiting their adaptability to diverse operational goals (such as efficiency, safety and task completion). To solve these problems, the paper proposes GRAPE (Generalizing Robot Policy via Preference Alignment), a new method aimed at improving the generalization ability of VLA models through preference alignment. GRAPE achieves this goal in the following ways: 1. **Trajectory - level preference alignment**: GRAPE models rewards not only from successful trials but also from failed trials to enhance generalization ability for different tasks. 2. **Multi - stage decomposition**: Decompose complex manipulation tasks into multiple independent stages, and automatically guide preference modeling through large - scale vision - language models, using key points to propose spatio - temporal constraints. 3. **Flexible goal alignment**: These constraints are flexible and can be customized according to different operational goals (such as safety, efficiency or task success). Through these methods, GRAPE can improve the success rate of VLA models on new tasks, and can reduce the collision rate and the number of steps in task execution while maintaining a high success rate, thereby improving the operational efficiency and safety of robots. Experimental results show that, compared with the existing state - of - the - art VLA models, GRAPE increases the success rate by 51.79% and 60.36% on in - domain and unseen tasks respectively, while reducing the collision rate by 44.31% in terms of safety and reducing the number of steps by 11.15% in terms of efficiency.