Abstract:Learning contextual and spatial environmental representations enhances autonomous vehicle’s hazard anticipation and decision-making in complex scenarios. Recent perception systems enhance spatial understanding with sensor fusion but often lack global environmental context. Humans, when driving, naturally employ neural maps that integrate various factors such as historical data, situational subtleties, and behavioral predictions of other road users to form a rich contextual understanding of their surroundings. This neural map-based comprehension is integral to making informed decisions on the road. In contrast, even with their significant advancements, autonomous systems have yet to fully harness this depth of human-like contextual understanding. Motivated by this, our work draws inspiration from human driving patterns and seeks to formalize the sensor fusion approach within an end-to-end autonomous driving framework. We introduce a framework that integrates three cameras (left, right, and center) to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation. The sensor data is fused and encoded using a self-attention mechanism, leading to an auto-regressive waypoint prediction module. We treat feature representation as a sequential problem, employing a vision transformer to distill the contextual interplay between sensor modalities. The efficacy of the proposed method is experimentally evaluated in both open and closed-loop settings. Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset. In closed-loop evaluations on CARLA’s Town05 Long and Longest6 benchmarks, the proposed method enhances driving performance, route completion, and reduces infractions.

Increasing the Efficiency of Policy Learning for Autonomous Vehicles by Multi-Task Representation Learning

Efficient Latent Representations using Multiple Tasks for Autonomous Driving

Learning an Efficient and Safe Policy for Highway Driving Using Supervised Learning and Reinforcement Learning.

Efficient Learning of Urban Driving Policies Using Bird's-Eye-View State Representations

Towards Learning Generalizable Driving Policies from Restricted Latent Representations

Deep Occupancy-Predictive Representations for Autonomous Driving

Policy-Based Reinforcement Learning for Training Autonomous Driving Agents in Urban Areas With Affordance Learning

Exploiting Multi-Modal Fusion for Urban Autonomous Driving Using Latent Deep Reinforcement Learning

Exploring Contextual Representation and Multi-Modality for End-to-End Autonomous Driving

Action-Based Representation Learning for Autonomous Driving

Multi-Vehicle Mixed-Reality Reinforcement Learning for Autonomous Multi-Lane Driving

Dynamic Environment Prediction in Urban Scenes using Recurrent Representation Learning

Conditional Affordance Learning for Driving in Urban Environments

CARNet: A Dynamic Autoencoder for Learning Latent Dynamics in Autonomous Driving Tasks

Learning predictive representations in autonomous driving to improve deep reinforcement learning

Conditional Vehicle Trajectories Prediction in CARLA Urban Environment

Model-free Deep Reinforcement Learning for Urban Autonomous Driving

Multi-modal policy fusion for end-to-end autonomous driving

Enhancing State Representation in Multi-Agent Reinforcement Learning for Platoon-Following Models

Interpretable End-to-End Urban Autonomous Driving With Latent Deep Reinforcement Learning