Making Large Language Models Better Planners with Reasoning-Decision Alignment

Zhijian Huang,Tao Tang,Shaoxiang Chen,Sihao Lin,Zequn Jie,Lin Ma,Guangrun Wang,Xiaodan Liang

2024-08-26

Abstract:Data-driven approaches for autonomous driving (AD) have been widely adopted in the past decade but are confronted with dataset bias and uninterpretability. Inspired by the knowledge-driven nature of human driving, recent approaches explore the potential of large language models (LLMs) to improve understanding and decision-making in traffic scenarios. They find that the pretrain-finetune paradigm of LLMs on downstream data with the Chain-of-Thought (CoT) reasoning process can enhance explainability and scene understanding. However, such a popular strategy proves to suffer from the notorious problems of misalignment between the crafted CoTs against the consequent decision-making, which remains untouched by previous LLM-based AD methods. To address this problem, we motivate an end-to-end decision-making model based on multimodality-augmented LLM, which simultaneously executes CoT reasoning and carries out planning results. Furthermore, we propose a reasoning-decision alignment constraint between the paired CoTs and planning results, imposing the correspondence between reasoning and decision-making. Moreover, we redesign the CoTs to enable the model to comprehend complex scenarios and enhance decision-making performance. We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver. Experimental evaluations on the nuScenes and DriveLM-nuScenes benchmarks demonstrate the effectiveness of our RDA-Driver in enhancing the performance of end-to-end AD systems. Specifically, our RDA-Driver achieves state-of-the-art planning performance on the nuScenes dataset with 0.80 L2 error and 0.32 collision rate, and also achieves leading results on challenging DriveLM-nuScenes benchmarks with 0.82 L2 error and 0.38 collision rate.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve The paper aims to address the issue of inconsistency in reasoning and decision-making of large language models (LLMs) in autonomous driving (AD). Specifically, current LLM-based autonomous driving systems, while capable of enhancing interpretability and scene understanding through Chain-of-Thought (CoT) reasoning, exhibit inconsistencies between CoT scores and decision accuracy during actual decision-making processes. This inconsistency can lead to erroneous driving decisions, thereby affecting the safety and reliability of the system. To solve this problem, the authors propose an end-to-end decision model called RDA-Driver, which is based on multimodal enhancement. This model is capable of performing CoT reasoning while simultaneously planning results, and it ensures consistency between the reasoning process and the final decision by introducing a reasoning-decision alignment constraint. Additionally, the authors have redesigned the CoT structure to better understand complex scenarios and improve decision performance. Experimental results show that RDA-Driver performs excellently in the nuScenes and DriveLM-nuScenes benchmarks, achieving the best performance with 0.80 L2 error and 0.32 collision rate, and 0.82 L2 error and 0.38 collision rate, respectively. This demonstrates the effectiveness of RDA-Driver in improving the performance of end-to-end autonomous driving systems.

Making Large Language Models Better Planners with Reasoning-Decision Alignment

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving

Asynchronous Large Language Model Enhanced Planner for Autonomous Driving

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

Instruct Large Language Models to Drive like Humans

A Language Agent for Autonomous Driving

Empowering Autonomous Driving with Large Language Models: A Safety Perspective

OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

Towards Interactive and Learnable Cooperative Driving Automation: a Large Language Model-Driven Decision-Making Framework

LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning

Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment

Receive, Reason, and React: Drive as You Say, With Large Language Models in Autonomous Vehicles

Driving Everywhere with Large Language Model Policy Adaptation

AD-H: Autonomous Driving with Hierarchical Agents

Integrating visual large language model and reasoning chain for driver behavior analysis and risk assessment

Language Model Non-myopic Generation for Reasoning and Planning

Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles

Non-myopic Generation of Language Models for Reasoning and Planning

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models