Language Prompt for Autonomous Driving

Dongming Wu,Wencheng Han,Tiancai Wang,Yingfei Liu,Xiangyu Zhang,Jianbing Shen

2023-09-08

Abstract:A new trend in the computer vision community is to capture objects of interest following flexible human command represented by a natural language prompt. However, the progress of using language prompts in driving scenarios is stuck in a bottleneck due to the scarcity of paired prompt-instance data. To address this challenge, we propose the first object-centric language prompt set for driving scenes within 3D, multi-view, and multi-frame space, named NuPrompt. It expands Nuscenes dataset by constructing a total of 35,367 language descriptions, each referring to an average of 5.3 object tracks. Based on the object-text pairs from the new benchmark, we formulate a new prompt-based driving task, \ie, employing a language prompt to predict the described object trajectory across views and frames. Furthermore, we provide a simple end-to-end baseline model based on Transformer, named PromptTrack. Experiments show that our PromptTrack achieves impressive performance on NuPrompt. We hope this work can provide more new insights for the autonomous driving community. Dataset and Code will be made public at \href{<a class="link-external link-https" href="https://github.com/wudongming97/Prompt4Driving" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/wudongming97/Prompt4Driving" rel="external noopener nofollow">this https URL</a>}.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the problem of object detection and tracking using natural language prompts in autonomous driving scenarios. Specifically, the research team found that the current computer vision community has made progress in using flexible human instructions (given in natural language) to capture objects of interest. However, there are bottlenecks when applying these language prompts in driving scenarios, mainly due to the lack of sufficient paired language descriptions and instance data. To address this issue, the paper proposes a new dataset called NuPrompt, which is the first object-centric language prompt set for 3D, multi-view, and multi-frame spaces in driving scenarios. This dataset extends the Nuscenes dataset by constructing a large number of language descriptions (a total of 35,367 object-prompt pairs), with each description corresponding to an average of 5.3 object trajectories. Based on these new object-text pairs in the dataset, the authors define a new prompt-based driving task, which involves using language prompts to predict the trajectories of described objects across different views and frames. Additionally, the paper proposes a Transformer-based end-to-end baseline model called PromptTrack to address the newly defined task. Experimental results show that PromptTrack performs excellently on the NuPrompt dataset, effectively integrating cross-modal features and predicting the objects indicated by the language prompts. This work is expected to provide new insights and technical support for the field of autonomous driving.

Language Prompt for Autonomous Driving

UnstrPrompt: Large Language Model Prompt for Driving in Unstructured Scenarios

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

Promptable Closed-loop Traffic Simulation

A Text Prompt-Based Approach for Zero-Shot Corner Case Object Detection in Autonomous Driving

Visual In-Context Prompting

AdaPrompt: Adaptive Model Training for Prompt-based NLP

Unified Vision and Language Prompt Learning

Prompt3D: Random Prompt Assisted Weakly-Supervised 3D Object Detection

NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario

Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation

Visual Prompt Multi-Modal Tracking

DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions

Prompt Space Optimizing Few-shot Reasoning Success with Large Language Models

Dynamic Prompting: A Unified Framework for Prompt Tuning

Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models

X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation

Learning Domain Invariant Prompt for Vision-Language Models

NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization