Abstract:Large language models (LLMs) have opened up new possibilities for intelligent agents, endowing them with human-like thinking and cognitive abilities. In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD). We introduce DriveMLM, an LLM-based AD framework that can perform close-loop autonomous driving in realistic simulators. To this end, (1) we bridge the gap between the language decisions and the vehicle control commands by standardizing the decision states according to the off-the-shelf motion planning module. (2) We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system, which uses driving rules, user commands, and inputs from various sensors (e.g., camera, lidar) as input and makes driving decisions and provide explanations; This model can plug-and-play in existing AD systems such as Apollo for close-loop driving. (3) We design an effective data engine to collect a dataset that includes decision state and corresponding explanation annotation for model training and evaluation. We conduct extensive experiments and show that our model achieves 76.1 driving score on the CARLA Town05 Long, and surpasses the Apollo baseline by 4.7 points under the same settings, demonstrating the effectiveness of our model. We hope this work can serve as a baseline for autonomous driving with LLMs. Code and models shall be released at <a class="link-external link-https" href="https://github.com/OpenGVLab/DriveMLM" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper aims to explore the potential applications of Large Language Models (LLMs) in the field of Autonomous Driving (AD), particularly by aligning Multimodal Large Language Models (MLLMs) with behavioral planning states to achieve closed-loop autonomous driving. The research team proposed a framework named DriveMLM, which enables AD systems based on LLMs to perform closed-loop driving tasks in a real-world simulator. To achieve this goal, the work in the paper focuses on the following three aspects: 1. **Behavioral Planning State Alignment**: Researchers analyzed the decision states of the mature Apollo autonomous driving system's behavioral planning module and standardized them so that LLMs can process these decision states. This allows the outputs of LLMs to be transformed into vehicle control signals, thus achieving seamless integration with existing AD systems. 2. **Multimodal LLM (MLLM) Planner Design**: A MLLM planner was developed that can receive multimodal inputs including multi-angle images, LiDAR point clouds, traffic rules, and user instructions, and predict driving decisions. In addition, the model can also provide decision explanations, enhancing the model's transparency and interpretability. 3. **Efficient Data Engine**: An effective data collection strategy was designed to generate datasets containing decision states and corresponding explanatory annotations to support model training and evaluation. The research team manually collected 280 hours of driving data on the CARLA simulator, converted into decision states and explanatory annotations, providing a rich data resource for model training. Experimental results show that the DriveMLM model achieved a driving score of 76.1 in the CARLA Town05 Long benchmark test, which is 4.7 points higher than the Apollo baseline, proving the model's effectiveness and superiority in the same setup. Moreover, the model can also adjust driving preferences through language instructions without changing the existing AD system structure, such as yielding to ambulances or ignoring red lights, demonstrating its flexibility and adaptability. In summary, DriveMLM not only bridges the gap between LLMs and closed-loop driving but also opens up new directions for the development of autonomous driving technology through multimodal data processing and decision alignment.

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment

Empowering Autonomous Driving with Large Language Models: A Safety Perspective

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Facilitating Autonomous Driving Tasks with Large Language Models

A Survey on Multimodal Large Language Models for Autonomous Driving

Probing Multimodal LLMs as World Models for Driving

DriveLLM: Charting the Path Toward Full Autonomous Driving with Large Language Models

LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs

Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs

Driving Everywhere with Large Language Model Policy Adaptation

Receive, Reason, and React: Drive as You Say, With Large Language Models in Autonomous Vehicles

Personalized Autonomous Driving with Large Language Models: Field Experiments

KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles

Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles