Vision Language Models in Autonomous Driving: A Survey and Outlook

Xingcheng Zhou,Mingyu Liu,Ekim Yurtsever,Bare Luka Zagar,Walter Zimmer,Hu Cao,Alois C. Knoll
2024-06-21
Abstract:The applications of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs). By incorporating language data, driving systems can gain a better understanding of real-world environments, thereby enhancing driving safety and efficiency. In this work, we present a comprehensive and systematic survey of the advances in vision language models in this domain, encompassing perception and understanding, navigation and planning, decision-making and control, end-to-end autonomous driving, and data generation. We introduce the mainstream VLM tasks in AD and the commonly utilized metrics. Additionally, we review current studies and applications in various areas and summarize the existing language-enhanced autonomous driving datasets thoroughly. Lastly, we discuss the benefits and challenges of VLMs in AD and provide researchers with the current research gaps and future trends.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the application of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD) and explore their potential in enhancing autonomous driving technology. Although current autonomous driving systems have made significant progress in perception and prediction, they still face numerous challenges in handling complex dynamic environments, interpreting decision-making processes, and following human instructions. To address these issues, the paper presents the following major contributions: 1. **Comprehensive Review**: Provides a comprehensive survey of large vision-language models in the field of autonomous driving, categorizing existing research by VLM type and application area. 2. **Tasks and Metrics**: Integrates mainstream vision-language tasks in the autonomous driving field and their corresponding general evaluation metrics. 3. **Dataset Analysis**: Systematically summarizes and analyzes existing classical and enhanced autonomous driving datasets. 4. **Potential Applications**: Explores the potential applications and technological advancements of VLMs in autonomous driving. 5. **Discussion and Outlook**: Discusses in-depth the advantages, challenges, and research gaps in this field, and points out future research trends. Through these efforts, the paper hopes to fill the current gap in systematic summaries and discussions on the application of VLMs in the field of autonomous driving, providing researchers with a comprehensive understanding framework to promote the development of this field.