Abstract:The large-scale deployment of autonomous vehicles is yet to come, and one of the major remaining challenges lies in urban dense traffic scenarios. In such cases, it remains challenging to predict the future evolution of the scene and future behaviors of objects, and to deal with rare adverse events such as the sudden appearance of occluded objects. In this paper, we present ReasonNet, a novel end-to-end driving framework that extensively exploits both temporal and global information of the driving scene. By reasoning on the temporal behavior of objects, our method can effectively process the interactions and relationships among features in different frames. Reasoning about the global information of the scene can also improve overall perception performance and benefit the detection of adverse events, especially the anticipation of potential danger from occluded objects. For comprehensive evaluation on occlusion events, we also release publicly a driving simulation benchmark DriveOcclusionSim consisting of diverse occlusion events. We conduct extensive experiments on multiple CARLA benchmarks, where our model outperforms all prior methods, ranking first on the sensor track of the public CARLA Leaderboard.

What problem does this paper attempt to address?

The paper aims to address two major challenges faced by autonomous vehicles in dense urban traffic scenarios: first, how to achieve a comprehensive understanding of the driving scene and make high-fidelity predictions about the future evolution of the scene; second, how to handle rare adverse events in the long-tail distribution, such as objects that are not detected but relevant in occluded areas. To tackle these issues, the paper proposes a new end-to-end driving framework named ReasonNet, which fully utilizes the temporal and global information of driving scenes. By reasoning over temporal behaviors, ReasonNet can effectively handle feature interactions and relationships between different frames, while reasoning about global information of the scene can improve overall perception performance, especially in predicting potential dangers, such as hazards that may emerge from occluded objects. The paper also introduces a driving simulation benchmark called DriveOcclusionSim, which includes a variety of occlusion events, to comprehensively evaluate performance under occlusion. In multiple CARLA benchmark tests, the model outperformed all previous methods and ranked first on the public CARLA leaderboard in the sensor track. The main contributions of the paper include: 1. Proposing a new Temporal and Global Reasoning Network (ReasonNet) to enhance reasoning about historical scenes and achieve high-fidelity predictions of future scene evolution, improving global context awareness even in occlusion situations. 2. Constructing a new benchmark called Driving in Occlusion Simulation (DOS), which contains a variety of occlusion scenarios in urban driving, serving as a systematic benchmark for comprehensively evaluating occlusion events in the field of end-to-end autonomous driving, and making this benchmark public. 3. Validating the effectiveness of the method on multiple benchmarks that include complex and adversarial urban scenarios, with the model ranking first on the CARLA autonomous driving leaderboard in the sensor track. By proposing ReasonNet and the DOS benchmark, the paper aims to advance the perception and decision-making capabilities of autonomous vehicles when facing occluded objects and predicting future scenes, in order to achieve safer and more reliable autonomous driving technology.

ReasonNet: End-to-End Driving with Temporal and Global Reasoning

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving

Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving

Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving (in CARLA-v2)

DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving

Disentangling Perception-failure-induced Corner Cases by Counterfactual Reasoning

Driving with Regulation: Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM

End-to-End Autonomous Driving With Semantic Depth Cloud Mapping and Multi-Agent

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Think Twice Before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

Graph-based Topology Reasoning for Driving Scenes

ShuDA-RFBNet for Real-time Multi-task Traffic Scene Perception

Attention-Based Interrelation Modeling for Explainable Automated Driving

Learning Visual Abstract Reasoning through Dual-Stream Networks

DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving

End-to-End Urban Autonomous Driving With Safety Constraints

DriveLM: Driving with Graph Visual Question Answering