Abstract:Autonomous vehicle navigation is a key challenge in artificial intelligence, requiring robust and accurate decision-making processes. This research introduces a new end-to-end method that exploits multimodal information from a single monocular camera to improve the steering predictions for self-driving cars. Unlike conventional models that require several sensors which can be costly and complex or rely exclusively on RGB images that may not be robust enough under different conditions, our model significantly improves vehicle steering prediction performance from a single visual sensor. By focusing on the fusion of RGB imagery with depth completion information or optical flow data, we propose a comprehensive framework that integrates these modalities through both early and hybrid fusion techniques. We use three distinct neural network models to implement our approach: Convolution Neural Network - Neutral Circuit Policy (CNN-NCP) , Variational Auto Encoder - Long Short-Term Memory (VAE-LSTM) , and Neural Circuit Policy architecture VAE-NCP. By incorporating optical flow into the decision-making process, our method significantly advances autonomous navigation. Empirical results from our comparative study using Boston driving data show that our model, which integrates image and motion information, is robust and reliable. It outperforms state-of-the-art approaches that do not use optical flow, reducing the steering estimation error by 31%. This demonstrates the potential of optical flow data, combined with advanced neural network architectures (a CNN-based structure for fusing data and a Recurrence-based network for inferring a command from latent space), to enhance the performance of autonomous vehicles steering estimation.

Navigating an Automated Driving Vehicle via the Early Fusion of Multi-Modality

Multimodal End-to-End Autonomous Driving

Autonomous Driving with Human Guided Image Feature Extraction

Multi-Modal Sensor Fusion-Based Deep Neural Network for End-to-End Autonomous Driving With Scene Understanding

Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

LiDAR-as-Camera for End-to-End Driving

Probabilistic End-to-End Vehicle Navigation in Complex Dynamic Environments with Multimodal Sensor Fusion

MMFN: Multi-Modal-Fusion-Net for End-to-End Driving

Exploiting Multi-Modal Fusion for Urban Autonomous Driving Using Latent Deep Reinforcement Learning

Autonomous Multi-Sensor Fusion Techniques for Environmental Perception in Self-Driving Vehicles

End-to-End Autonomous Driving With Semantic Depth Cloud Mapping and Multi-Agent

Multimodal Fusion Using Deep Learning Applied to Driver's Referencing of Outside-Vehicle Objects

M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving

Multi-Modal Neural Feature Fusion for Automatic Driving Through Perception-Aware Path Planning

Enhanced Perception for Autonomous Driving Using Semantic and Geometric Data Fusion

A Multimodal Perception-Driven Self Evolving Autonomous Ground Vehicle

Integrating Modular Pipelines with End-to-End Learning: A Hybrid Approach for Robust and Reliable Autonomous Driving Systems

Optical Flow Matters: an Empirical Comparative Study on Fusing Monocular Extracted Modalities for Better Steering

Humanlike Driving: Empirical Decision-Making System for Autonomous Vehicles