Research on Task Decomposition and Motion Trajectory Optimization of Robotic Arm Based on VLA Large Model

Wentao Lu,Xiaofeng Wu,Shuyong Gao,Wei He,Qing Zhao,Lunning Zhang,Maodong Li,Wenqiang Zhang
DOI: https://doi.org/10.1109/icmlca63499.2024.10754333
2024-01-01
Abstract:With the continuous advancement of intelligent robotics, efficient execution of multiple tasks in complex environments has emerged as a critical research area. In this paper, we propose an innovative framework based on visual-linguistic alignment and visual-language-action (VLA) models to enhance the task decomposition and motion trajectory optimization capabilities of robotic arms. Our model leverages multimodal fusion of visual information and natural language instructions to automatically understand and decompose complex tasks. While existing VLA models demonstrate strong integration of vision, language, and motion, they often depend heavily on the diversity and coverage of training data, limiting their generalization to diverse tasks, especially in cross-domain scenarios. Furthermore, the simplistic treatment of inputs by current VLA models can hinder the deep understanding of visual content and create ambiguities in interpreting abstract or unclear linguistic commands, which reduces the accuracy of task decomposition and execution. To address these limitations,this paper proposes a more robust multimodal fusion mechanism that integrates a hybrid module combining multiple cross-attention and self-attention mechanisms. By incorporating appropriate image and language encoding modalities, the model improves the processing of complex natural language instructions, enhances multimodal fusion, and refines task decomposition and execution. This provides a significant improvement over traditional VLA models, while also enhancing generalization across diverse task scenarios. Experimental results demonstrate that the model achieves faster response times and greater robustness, particularly in noisy environments. These advancements make the model more suitable for practical applications and lay a solid foundation for the future deployment of single-arm robots in varied task environments.
What problem does this paper attempt to address?