Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Jianhua Wu,Bingzhao Gao,Jincheng Gao,Jianhao Yu,Hongqing Chu,Qiankun Yu,Xun Gong,Yi Chang,H. Eric Tseng,Hong Chen,Jie Chen

2024-05-17

Abstract:With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action instructions for driving decisions and planning. Furthermore, FMs can augment data based on the understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs' applications lies in World Models, exemplified by the DREAMER series, which showcases the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, World Model can generate unseen yet plausible driving environments, facilitating the enhancement in the prediction of road users' behaviors and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain.

Computer Vision and Pattern Recognition,Artificial Intelligence,Robotics

What problem does this paper attempt to address?

The paper primarily explores the potential applications and future trends of Foundation Models (FMs) in the field of autonomous driving. Specifically, the paper attempts to address the following issues: 1. **Enhancing Scene Understanding and Reasoning Abilities**: By pre-training on large amounts of language and visual data, FMs can better understand and interpret various elements in driving scenes, providing cognitive reasoning to generate language and action instructions to assist in driving decisions and planning. 2. **Handling Long-Tail Distribution Problems**: FMs can enhance datasets based on their understanding of driving scenes, thereby providing data for rare scenarios that are unlikely to be encountered in regular driving. This helps improve the accuracy and reliability of autonomous driving systems. 3. **Developing World Models**: Through self-supervised learning from large amounts of data, World Models can learn physical laws and dynamic characteristics, generating unseen but plausible driving environments. This enhances the ability to predict the behavior of road users and to train driving strategies offline. By leveraging the powerful capabilities of FMs, the paper aims to address potential issues arising from long-tail distributions in the field of autonomous driving, thereby improving the overall safety of the field. Additionally, the paper details how language models, visual models, and their combinations can be applied to enhance the understanding and reasoning abilities of autonomous driving systems in driving scenes, and discusses the research progress in directly using these models to generate specific control instructions.

Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities

Applications of Large Scale Foundation Models for Autonomous Driving

A Survey for Foundation Models in Autonomous Driving

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

World Models for Autonomous Driving: An Initial Survey

Parallel Driving with Big Models and Foundation Intelligence in Cyber-Physical-Social Spaces

Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey

Foundation Models for Rapid Autonomy Validation

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving

On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence

ADriver-I: A General World Model for Autonomous Driving

When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions

Multimodal Perception and Decision-Making Systems for Complex Roads Based on Foundation Models

Sora-Based Parallel Vision for Smart Sensing of Intelligent Vehicles: from Foundation Models to Foundation Intelligence

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

Safedrive Dreamer: Navigating Safety–critical Scenarios in Autonomous Driving with World Models

Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving