A Language Agent for Autonomous Driving

Jiageng Mao,Junjie Ye,Yuxi Qian,Marco Pavone,Yue Wang
2024-07-29
Abstract:Human-level driving is an ultimate goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their systems do not capitalize on the inherent reasoning ability and experiential knowledge of humans. In this paper, we propose a fundamental paradigm shift from current pipelines, exploiting Large Language Models (LLMs) as a cognitive agent to integrate human-like intelligence into autonomous driving systems. Our approach, termed Agent-Driver, transforms the traditional autonomous driving pipeline by introducing a versatile tool library accessible via function calls, a cognitive memory of common sense and experiential knowledge for decision-making, and a reasoning engine capable of chain-of-thought reasoning, task planning, motion planning, and self-reflection. Powered by LLMs, our Agent-Driver is endowed with intuitive common sense and robust reasoning capabilities, thus enabling a more nuanced, human-like approach to autonomous driving. We evaluate our approach on the large-scale nuScenes benchmark, and extensive experiments substantiate that our Agent-Driver significantly outperforms the state-of-the-art driving methods by a large margin. Our approach also demonstrates superior interpretability and few-shot learning ability to these methods.
Computer Vision and Pattern Recognition,Artificial Intelligence,Computation and Language,Robotics
What problem does this paper attempt to address?
The problem this paper attempts to address is the effective integration of human prior knowledge and reasoning abilities into autonomous driving systems to achieve more human-like and efficient autonomous driving. Specifically, traditional autonomous driving systems typically adopt a Perception-Prediction-Planning framework, but these systems lack the reasoning abilities and experiential knowledge of human drivers, leading to poor performance in handling complex driving scenarios. For example, when seeing a ball on the road, a human driver would instinctively anticipate that a child might be chasing it and would slow down accordingly, whereas a traditional autonomous driving system might continue driving until the sensors detect the child, leaving a smaller safety margin. To address these issues, the paper proposes a new approach called Agent-Driver, which leverages large language models (LLMs) as cognitive agents to incorporate human prior knowledge and reasoning abilities into autonomous driving systems. The main contributions of Agent-Driver include: 1. **Introduction of a toolkit**: Extracting necessary environmental information from neural modules through dynamic function calls, reducing redundancy. 2. **Construction of cognitive memory**: Storing common sense and driving experience to enhance the system's decision-making capabilities. 3. **Design of a reasoning engine**: Combining environmental information and memory data to perform chain reasoning, task planning, motion planning, and self-reflection, generating safe and comfortable driving trajectories. Experimental results show that Agent-Driver significantly outperforms existing autonomous driving methods in multiple benchmarks, particularly on the nuScenes dataset, where the collision rate is reduced by over 30%. Additionally, this approach demonstrates strong few-shot learning capabilities and interpretability.