Abstract:In recent years, autonomous driving algorithms using low-cost vehicle-mounted cameras have attracted increasing endeavors from both academia and industry. There are multiple fronts to these endeavors, including object detection on roads, 3-D reconstruction etc., but in this work we focus on a vision-based model that directly maps raw input images to steering angles using deep networks. This represents a nascent research topic in computer vision. The technical contributions of this work are three-fold. First, the model is learned and evaluated on real human driving videos that are time-synchronized with other vehicle sensors. This differs from many prior models trained from synthetic data in racing games. Second, state-of-the-art models, such as PilotNet, mostly predict the wheel angles independently on each video frame, which contradicts common understanding of driving as a stateful process. Instead, our proposed model strikes a combination of spatial and temporal cues, jointly investigating instantaneous monocular camera observations and vehicle's historical states. This is in practice accomplished by inserting carefully-designed recurrent units (e.g., LSTM and Conv-LSTM) at proper network layers. Third, to facilitate the interpretability of the learned model, we utilize a visual back-propagation scheme for discovering and visualizing image regions crucially influencing the final steering prediction. Our experimental study is based on about 6 hours of human driving data provided by Udacity. Comprehensive quantitative evaluations demonstrate the effectiveness and robustness of our model, even under scenarios like drastic lighting changes and abrupt turning. The comparison with other state-of-the-art models clearly reveals its superior performance in predicting the due wheel angle for a self-driving car.

Learning single-shot vehicle orientation estimation from large-scale street panoramas.

Design of an Enhanced Visual Odometry by Building and Matching Compressive Panoramic Landmarks Online

A Panoramic Localizer Based on Coarse-to-Fine Descriptors for Navigation Assistance

Semantic Visual Odometry Based on Panoramic Annular Imaging

Learning from Maps: Visual Common Sense for Autonomous Driving

3D Orientation Estimation and Vanishing Point Extraction from Single Panoramas Using Convolutional Neural Network

Beyond Geo-localization: Fine-grained Orientation of Street-view Images by Cross-view Matching with Satellite Imagery with Supplementary Materials

Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference

Vehicle 3d Localization in Road Scenes VIA a Monocular Moving Camera

PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes

Vehicle Global 6-Dof Pose Estimation under Traffic Surveillance Camera

Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues

Learning End-to-End Autonomous Steering Model from Spatial and Temporal Visual Cues

OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping

Real-time Vehicle Localization and Tracking Using Monocular Panomorph Panoramic Vision

Understanding Bird's-Eye View of Road Semantics using an Onboard Camera

Panoramic Direct LiDAR-assisted Visual Odometry

Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras

Multi-task Panoramic Driving Perception Algorithm Based on Improved YOLOv5

Fine-grained Traffic Video Vehicle Recognition Based Orientation Estimation and Temporal Information