Interactive Navigation in Environments with Traversable Obstacles Using Large Language and Vision-Language Models

Zhen Zhang,Anran Lin,Chun Wai Wong,Xiangyu Chu,Qi Dou,K. W. Samuel Au
2024-03-13
Abstract:This paper proposes an interactive navigation framework by using large language and vision-language models, allowing robots to navigate in environments with traversable obstacles. We utilize the large language model (GPT-3.5) and the open-set Vision-language Model (Grounding DINO) to create an action-aware costmap to perform effective path planning without fine-tuning. With the large models, we can achieve an end-to-end system from textual instructions like "Can you pass through the curtains to deliver medicines to me?", to bounding boxes (e.g., curtains) with action-aware attributes. They can be used to segment LiDAR point clouds into two parts: traversable and untraversable parts, and then an action-aware costmap is constructed for generating a feasible path. The pre-trained large models have great generalization ability and do not require additional annotated data for training, allowing fast deployment in the interactive navigation tasks. We choose to use multiple traversable objects such as curtains and grasses for verification by instructing the robot to traverse them. Besides, traversing curtains in a medical scenario was tested. All experimental results demonstrated the proposed framework's effectiveness and adaptability to diverse environments.
Robotics,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve interactive navigation of robots in environments containing traversable obstacles (such as curtains, grasslands, etc.). Specifically, the paper proposes an interactive navigation framework based on large - language models (such as GPT - 3.5) and vision - language models (such as Grounding DINO), enabling robots to plan feasible paths according to human natural - language instructions (for example, "Can you pass through the curtain to bring me medicine?") and navigate in these environments. Traditional navigation systems usually regard all obstacles as non - traversable, which limits the flexibility and adaptability of robots. By introducing action - aware attributes, this framework can distinguish between traversable and non - traversable obstacles, thereby enhancing the robot's navigation ability in complex environments. The main contributions of the paper include: 1. Proposing an interactive navigation framework based on pre - trained large models, enabling robots to plan feasible paths in environments containing traversable objects. 2. Extracting action - aware attributes from text instructions in addition to landmarks to assist in sensor data segmentation and construct action - aware cost maps. 3. Experimentally verifying the effectiveness and generalization ability of the proposed framework in different traversable objects and scenarios.