Vision Language Models in Autonomous Driving: A Survey and Outlook

Xingcheng Zhou,Mingyu Liu,Ekim Yurtsever,Bare Luka Zagar,Walter Zimmer,Hu Cao,Alois C. Knoll

2024-06-21

Abstract:The applications of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs). By incorporating language data, driving systems can gain a better understanding of real-world environments, thereby enhancing driving safety and efficiency. In this work, we present a comprehensive and systematic survey of the advances in vision language models in this domain, encompassing perception and understanding, navigation and planning, decision-making and control, end-to-end autonomous driving, and data generation. We introduce the mainstream VLM tasks in AD and the commonly utilized metrics. Additionally, we review current studies and applications in various areas and summarize the existing language-enhanced autonomous driving datasets thoroughly. Lastly, we discuss the benefits and challenges of VLMs in AD and provide researchers with the current research gaps and future trends.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the application of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD) and explore their potential in enhancing autonomous driving technology. Although current autonomous driving systems have made significant progress in perception and prediction, they still face numerous challenges in handling complex dynamic environments, interpreting decision-making processes, and following human instructions. To address these issues, the paper presents the following major contributions: 1. **Comprehensive Review**: Provides a comprehensive survey of large vision-language models in the field of autonomous driving, categorizing existing research by VLM type and application area. 2. **Tasks and Metrics**: Integrates mainstream vision-language tasks in the autonomous driving field and their corresponding general evaluation metrics. 3. **Dataset Analysis**: Systematically summarizes and analyzes existing classical and enhanced autonomous driving datasets. 4. **Potential Applications**: Explores the potential applications and technological advancements of VLMs in autonomous driving. 5. **Discussion and Outlook**: Discusses in-depth the advantages, challenges, and research gaps in this field, and points out future research trends. Through these efforts, the paper hopes to fill the current gap in systematic summaries and discussions on the application of VLMs in the field of autonomous driving, providing researchers with a comprehensive understanding framework to promote the development of this field.

Vision Language Models in Autonomous Driving: A Survey and Outlook

Vision Language Models in Autonomous Driving and Intelligent Transportation Systems

A Survey on Multimodal Large Language Models for Autonomous Driving

VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision

XLM for Autonomous Driving Systems: A Comprehensive Review

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

Large Language Models for Human-like Autonomous Driving: A Survey

Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving

Vision-Language Models for Vision Tasks: A Survey

A Survey on Large Language Model-empowered Autonomous Driving

Empowering Autonomous Driving with Large Language Models: A Safety Perspective

V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models

An Introduction to Vision-Language Modeling

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

On-Board Vision-Language Models for Personalized Autonomous Vehicle Motion Control: System Design and Real-World Validation

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models