DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

Zhenhua Xu,Yujia Zhang,Enze Xie,Zhen Zhao,Yong Guo,Kwan-Yee. K. Wong,Zhenguo Li,Hengshuang Zhao

2024-03-15

Abstract:Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable end-to-end autonomous driving system based on LLMs. Capable of processing multi-frame video inputs and textual queries, DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users. Furthermore, DriveGPT4 predicts low-level vehicle control signals in an end-to-end fashion. These advanced capabilities are achieved through the utilization of a bespoke visual instruction tuning dataset, specifically tailored for autonomous driving applications, in conjunction with a mix-finetuning training strategy. DriveGPT4 represents the pioneering effort to leverage LLMs for the development of an interpretable end-to-end autonomous driving solution. Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4. Additionally, the fine-tuning of domain-specific data enables DriveGPT4 to yield close or even improved results in terms of autonomous driving grounding when contrasted with GPT4-V. The code and dataset will be publicly available.

Computer Vision and Pattern Recognition,Robotics

What problem does this paper attempt to address?

The paper attempts to address the problem of developing an interpretable end-to-end driving system in the field of autonomous driving. Specifically, the researchers propose a new model named DriveGPT4, which aims to leverage large language models (LLMs) to process multimodal data, thereby achieving natural language explanations of autonomous vehicle behavior and predictions of low-level control signals. DriveGPT4 is capable of predicting the next control signals (such as vehicle speed and steering angle) based on video sequences captured by the front-facing camera, as well as engaging in dialogue with human users to explain the vehicle's behavior and the logic behind it. Additionally, by combining a customized visual instruction tuning dataset and a hybrid fine-tuning strategy, the model enhances the system's transparency and interpretability while maintaining high performance. Evaluation results show that DriveGPT4 outperforms existing baseline methods on multiple tasks, particularly excelling in complex driving scenarios.

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models

DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving

Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

Probing Multimodal LLMs as World Models for Driving

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

GPT-Driver: Learning to Drive with GPT

Drive As Veteran: Fine-tuning of an Onboard Large Language Model for Highway Autonomous Driving

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Testing Large Language Models on Driving Theory Knowledge and Skills for Connected Autonomous Vehicles

DriveLLM: Charting the Path Toward Full Autonomous Driving with Large Language Models

Personalized Autonomous Driving with Large Language Models: Field Experiments