Making Large Language Models Better Planners with Reasoning-Decision Alignment

Zhijian Huang,Tao Tang,Shaoxiang Chen,Sihao Lin,Zequn Jie,Lin Ma,Guangrun Wang,Xiaodan Liang
2024-08-26
Abstract:Data-driven approaches for autonomous driving (AD) have been widely adopted in the past decade but are confronted with dataset bias and uninterpretability. Inspired by the knowledge-driven nature of human driving, recent approaches explore the potential of large language models (LLMs) to improve understanding and decision-making in traffic scenarios. They find that the pretrain-finetune paradigm of LLMs on downstream data with the Chain-of-Thought (CoT) reasoning process can enhance explainability and scene understanding. However, such a popular strategy proves to suffer from the notorious problems of misalignment between the crafted CoTs against the consequent decision-making, which remains untouched by previous LLM-based AD methods. To address this problem, we motivate an end-to-end decision-making model based on multimodality-augmented LLM, which simultaneously executes CoT reasoning and carries out planning results. Furthermore, we propose a reasoning-decision alignment constraint between the paired CoTs and planning results, imposing the correspondence between reasoning and decision-making. Moreover, we redesign the CoTs to enable the model to comprehend complex scenarios and enhance decision-making performance. We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver. Experimental evaluations on the nuScenes and DriveLM-nuScenes benchmarks demonstrate the effectiveness of our RDA-Driver in enhancing the performance of end-to-end AD systems. Specifically, our RDA-Driver achieves state-of-the-art planning performance on the nuScenes dataset with 0.80 L2 error and 0.32 collision rate, and also achieves leading results on challenging DriveLM-nuScenes benchmarks with 0.82 L2 error and 0.38 collision rate.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve The paper aims to address the issue of inconsistency in reasoning and decision-making of large language models (LLMs) in autonomous driving (AD). Specifically, current LLM-based autonomous driving systems, while capable of enhancing interpretability and scene understanding through Chain-of-Thought (CoT) reasoning, exhibit inconsistencies between CoT scores and decision accuracy during actual decision-making processes. This inconsistency can lead to erroneous driving decisions, thereby affecting the safety and reliability of the system. To solve this problem, the authors propose an end-to-end decision model called RDA-Driver, which is based on multimodal enhancement. This model is capable of performing CoT reasoning while simultaneously planning results, and it ensures consistency between the reasoning process and the final decision by introducing a reasoning-decision alignment constraint. Additionally, the authors have redesigned the CoT structure to better understand complex scenarios and improve decision performance. Experimental results show that RDA-Driver performs excellently in the nuScenes and DriveLM-nuScenes benchmarks, achieving the best performance with 0.80 L2 error and 0.32 collision rate, and 0.82 L2 error and 0.38 collision rate, respectively. This demonstrates the effectiveness of RDA-Driver in improving the performance of end-to-end autonomous driving systems.