Abstract:Traditional autonomous driving methods adopt a modular design, decomposing tasks into sub-tasks. In contrast, end-to-end autonomous driving directly outputs actions from raw sensor data, avoiding error accumulation. However, training an end-to-end model requires a comprehensive dataset; otherwise, the model exhibits poor generalization capabilities. Recently, large language models (LLMs) have been applied to enhance the generalization capabilities of end-to-end driving models. Most studies explore LLMs in an open-loop manner, where the output actions are compared to those of experts without direct feedback from the real world, while others examine closed-loop results only in simulations. This paper proposes an efficient architecture that integrates multimodal LLMs into end-to-end driving models operating in closed-loop settings in real-world environments. In our architecture, the LLM periodically processes raw sensor data to generate high-level driving instructions, effectively guiding the end-to-end model, even at a slower rate than the raw sensor data. This architecture relaxes the trade-off between the latency and inference quality of the LLM. It also allows us to choose from a wide variety of LLMs to improve high-level driving instructions and minimize fine-tuning costs. Consequently, our architecture reduces data collection requirements because the LLMs do not directly output actions; we only need to train a simple imitation learning model to output actions. In our experiments, the training data for the end-to-end model in a real-world environment consists of only simple obstacle configurations with one traffic cone, while the test environment is more complex and contains multiple obstacles placed in various positions. Experiments show that the proposed architecture enhances the generalization capabilities of the end-to-end model even without fine-tuning the LLM.

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

DriveLLM: Charting the Path Toward Full Autonomous Driving with Large Language Models

Facilitating Autonomous Driving Tasks with Large Language Models

Personalized Autonomous Driving with Large Language Models: Field Experiments

Driving Everywhere with Large Language Model Policy Adaptation

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving

Empowering Autonomous Driving with Large Language Models: A Safety Perspective

Receive, Reason, and React: Drive as You Say, With Large Language Models in Autonomous Vehicles

Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs

Instruct Large Language Models to Drive like Humans

Probing Multimodal LLMs as World Models for Driving

Evaluation of Large Language Models for Decision Making in Autonomous Driving