Abstract:Accurate 3D image recognition, critical for autonomous driving safety, is shifting from the LIDAR-based point cloud to camera-based depth estimation technologies driven by cost considerations and the point cloud's limitations in detecting distant small objects. This research aims to enhance MDE (Monocular Depth Estimation) using a single camera, offering extreme cost-effectiveness in acquiring 3D environmental data. In particular, this paper focuses on novel data augmentation methods designed to enhance the accuracy of MDE. Our research addresses the challenge of limited MDE data quantities by proposing the use of synthetic-based augmentation techniques: Mask, Mask-Scale, and CutFlip. The implementation of these synthetic-based data augmentation strategies has demonstrably enhanced the accuracy of MDE models by 4.0% compared to the original dataset. Furthermore, this study introduces the RMS (Real-time Monocular Depth Estimation configuration considering Resolution, Efficiency, and Latency) algorithm, designed for the optimization of neural networks to augment the performance of contemporary monocular depth estimation technologies through a three-step process. Initially, it selects a model based on minimum latency and REL criteria, followed by refining the model's accuracy using various data augmentation techniques and loss functions. Finally, the refined model is compressed using quantization and pruning techniques to minimize its size for efficient on-device real-time applications. Experimental results from implementing the RMS algorithm indicated that, within the required latency and size constraints, the IEBins model exhibited the most accurate REL (absolute RELative error) performance, achieving a 0.0480 REL. Furthermore, the data augmentation combination of the original dataset with Flip, Mask, and CutFlip, alongside the SigLoss loss function, displayed the best REL performance, with a score of 0.0461. The network compression technique using FP16 was analyzed as the most effective, reducing the model size by 83.4% compared to the original while maintaining the least impact on REL performance and latency. Finally, the performance of the RMS algorithm was validated on the on-device autonomous driving platform, NVIDIA Jetson AGX Orin, through which optimal deployment strategies were derived for various applications and scenarios requiring autonomous driving technologies.

Real-Time Stereo Image Depth Estimation Network with Group-Wise L1 Distance for Edge Devices Towards Autonomous Driving

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

A Robust Monocular Depth Estimation Framework Based on Light-Weight ERF-Pspnet for Day-Night Driving Scenes

Re-Parameterized Real-Time Stereo Matching Network Based on Mixed Cost Volumes Toward Autonomous Driving

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

Depth Generation Network: Estimating Real World Depth From Stereo And Depth Images

Depth Estimation of Traffic Scenes from Image Sequence Using Deep Learning.

Real-time depth completion based on LiDAR-stereo for autonomous driving

Real-Time Monocular Depth Estimation Merging Vision Transformers on Edge Devices for AIoT

Real-time Monocular Depth Estimation on Embedded Systems

Real-Time Monocular Human Depth Estimation and Segmentation on Embedded Systems

Lightweight multi-scale convolutional neural network for real time stereo matching

RealNet: Combining Optimized Object Detection with Information Fusion Depth Estimation Co-Design Method on IoT

Depth-Guided Aggregation for Real-Time Binocular Depth Estimation Network

Driving Scene Perception Network: Real-time Joint Detection, Depth Estimation and Semantic Segmentation

TinyStereo: A Tiny Coarse-to-Fine Framework for Vision-Based Depth Estimation on Embedded GPUs

Joint Optimization of Depth and Ego-Motion for Intelligent Autonomous Vehicles

Synthetic Data Enhancement and Network Compression Technology of Monocular Depth Estimation for Real-Time Autonomous Driving System

On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach

Novel Hybrid Neural Network for Dense Depth Estimation Using On-Board Monocular Images

Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations