ReasonNet: End-to-End Driving with Temporal and Global Reasoning

Hao Shao,Letian Wang,Ruobing Chen,Steven L. Waslander,Hongsheng Li,Yu Liu
2023-05-18
Abstract:The large-scale deployment of autonomous vehicles is yet to come, and one of the major remaining challenges lies in urban dense traffic scenarios. In such cases, it remains challenging to predict the future evolution of the scene and future behaviors of objects, and to deal with rare adverse events such as the sudden appearance of occluded objects. In this paper, we present ReasonNet, a novel end-to-end driving framework that extensively exploits both temporal and global information of the driving scene. By reasoning on the temporal behavior of objects, our method can effectively process the interactions and relationships among features in different frames. Reasoning about the global information of the scene can also improve overall perception performance and benefit the detection of adverse events, especially the anticipation of potential danger from occluded objects. For comprehensive evaluation on occlusion events, we also release publicly a driving simulation benchmark DriveOcclusionSim consisting of diverse occlusion events. We conduct extensive experiments on multiple CARLA benchmarks, where our model outperforms all prior methods, ranking first on the sensor track of the public CARLA Leaderboard.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address two major challenges faced by autonomous vehicles in dense urban traffic scenarios: first, how to achieve a comprehensive understanding of the driving scene and make high-fidelity predictions about the future evolution of the scene; second, how to handle rare adverse events in the long-tail distribution, such as objects that are not detected but relevant in occluded areas. To tackle these issues, the paper proposes a new end-to-end driving framework named ReasonNet, which fully utilizes the temporal and global information of driving scenes. By reasoning over temporal behaviors, ReasonNet can effectively handle feature interactions and relationships between different frames, while reasoning about global information of the scene can improve overall perception performance, especially in predicting potential dangers, such as hazards that may emerge from occluded objects. The paper also introduces a driving simulation benchmark called DriveOcclusionSim, which includes a variety of occlusion events, to comprehensively evaluate performance under occlusion. In multiple CARLA benchmark tests, the model outperformed all previous methods and ranked first on the public CARLA leaderboard in the sensor track. The main contributions of the paper include: 1. Proposing a new Temporal and Global Reasoning Network (ReasonNet) to enhance reasoning about historical scenes and achieve high-fidelity predictions of future scene evolution, improving global context awareness even in occlusion situations. 2. Constructing a new benchmark called Driving in Occlusion Simulation (DOS), which contains a variety of occlusion scenarios in urban driving, serving as a systematic benchmark for comprehensively evaluating occlusion events in the field of end-to-end autonomous driving, and making this benchmark public. 3. Validating the effectiveness of the method on multiple benchmarks that include complex and adversarial urban scenarios, with the model ranking first on the CARLA autonomous driving leaderboard in the sensor track. By proposing ReasonNet and the DOS benchmark, the paper aims to advance the perception and decision-making capabilities of autonomous vehicles when facing occluded objects and predicting future scenes, in order to achieve safer and more reliable autonomous driving technology.